
Posted a day ago
Software Engineer, GPU Infrastructure - HPC
OpenAISoftware Engineer, GPU Infrastructure - HPC
Requirements
large-scale server environment management, Python or Go proficiency, Linux and networking knowledge, server hardware knowledge, SQL, PromQL, or Pandas
Skills
PythonGoLinuxSQLPrometheusHPC
About the role
Responsibilities
- Build and maintain automation systems for provisioning and managing large-scale server fleets
- Develop tools to monitor server health, performance, and lifecycle events
- Collaborate with clusters, networking, and infrastructure teams to ensure high availability
- Partner with external operators to maintain high quality standards across the compute fleet
- Identify and resolve performance bottlenecks and inefficiencies within the infrastructure
- Continuously improve automation to reduce manual intervention and improve reliability
Requirements
- Experience managing large-scale server environments
- Proficiency in Python, Go, or similar programming languages
- Strong knowledge of Linux, networking, and server hardware
- Ability to analyze and dig into noisy data using SQL, PromQL, and Pandas
Preferred Qualifications
- Experience with low-level hardware details (e.g., PCIe, Infiniband, power management, kernel perf tuning)
- Knowledge of hardware management protocols such as IPMI or Redfish
- Experience with High-Performance Computing (HPC) or distributed systems
- Familiarity with monitoring tools like Prometheus and Grafana
Benefits
- Competitive salary range of $230K – $490K plus equity
- Comprehensive medical, dental, and vision insurance
- 401(k) retirement plan with employer match
- Flexible PTO and paid parental leave
- Daily meals in the office and mental health support
- Annual learning and development stipend
About the Company
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI capabilities and seek to safely deploy them to the world through products like ChatGPT.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSoftware Engineer, GPU Infrastructure - HPC
OpenAI · San Francisco
