
Posted a day ago
Software Engineer, Frontier Clusters Infrastructure
OpenAISoftware Engineer, Frontier Clusters Infrastructure
Perks & benefits
Medical InsuranceHealth InsuranceHousing AllowanceMobile AllowancePaid Leave
Requirements
Experience in infrastructure or distributed systems engineering, Deep knowledge of Kubernetes internals, Proficiency in Python or Go, Familiarity with Infrastructure-as-Code tools, Experience with bare-metal Linux environments
Skills
KubernetesPythonGoTerraformLinuxDistributed Systems
About the role
Responsibilities
- Spin up and scale large Kubernetes clusters, including automation for provisioning, bootstrapping, and lifecycle management
- Build software abstractions that unify multiple clusters to present a seamless interface to training workloads
- Own node bring-up from bare metal through firmware upgrades to ensure repeatable deployment at scale
- Improve operational metrics, such as reducing cluster restart times and accelerating upgrade cycles
- Integrate networking and hardware health systems to deliver end-to-end reliability across servers and switches
- Develop monitoring and observability systems to detect issues early and maintain stability under extreme load
Requirements
- Experience as an infrastructure, systems, or distributed systems engineer in large-scale or high-availability environments
- Deep knowledge of Kubernetes internals, cluster scaling patterns, and containerized workloads
- Proficiency in Python, Go, or similar programming languages
- Familiarity with Infrastructure-as-Code tools such as Terraform or CloudFormation
- Experience with bare-metal Linux environments, GPU hardware, and large-scale networking
Preferred Qualifications
- Background with GPU workloads or high-performance computing (HPC)
- Experience with firmware management and hardware-level automation
Benefits
- Competitive salary range of $230K – $490K plus equity
- Comprehensive medical, dental, and vision insurance
- 401(k) retirement plan with employer match
- Paid parental leave and flexible PTO
- Daily meals in the office and meal delivery credits
- Annual learning and development stipend
- Relocation support for eligible employees
About the Company
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI capabilities and seek to safely deploy them to the world through our products.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSoftware Engineer, Frontier Clusters Infrastructure
OpenAI · San Francisco
