Software Engineer, Frontier Clusters Infrastructure at OpenAI - ScoutJobs - The AI-curated global job board
Skip to content
OpenAI
Posted a day ago

Software Engineer, Frontier Clusters Infrastructure

OpenAISoftware Engineer, Frontier Clusters Infrastructure

Perks & benefits

Medical InsuranceHealth InsuranceHousing AllowanceMobile AllowancePaid Leave

Requirements

Experience in infrastructure or distributed systems engineering, Deep knowledge of Kubernetes internals, Proficiency in Python or Go, Familiarity with Infrastructure-as-Code tools, Experience with bare-metal Linux environments

Skills

KubernetesPythonGoTerraformLinuxDistributed Systems

About the role

Responsibilities

  • Spin up and scale large Kubernetes clusters, including automation for provisioning, bootstrapping, and lifecycle management
  • Build software abstractions that unify multiple clusters to present a seamless interface to training workloads
  • Own node bring-up from bare metal through firmware upgrades to ensure repeatable deployment at scale
  • Improve operational metrics, such as reducing cluster restart times and accelerating upgrade cycles
  • Integrate networking and hardware health systems to deliver end-to-end reliability across servers and switches
  • Develop monitoring and observability systems to detect issues early and maintain stability under extreme load

Requirements

  • Experience as an infrastructure, systems, or distributed systems engineer in large-scale or high-availability environments
  • Deep knowledge of Kubernetes internals, cluster scaling patterns, and containerized workloads
  • Proficiency in Python, Go, or similar programming languages
  • Familiarity with Infrastructure-as-Code tools such as Terraform or CloudFormation
  • Experience with bare-metal Linux environments, GPU hardware, and large-scale networking

Preferred Qualifications

  • Background with GPU workloads or high-performance computing (HPC)
  • Experience with firmware management and hardware-level automation

Benefits

  • Competitive salary range of $230K – $490K plus equity
  • Comprehensive medical, dental, and vision insurance
  • 401(k) retirement plan with employer match
  • Paid parental leave and flexible PTO
  • Daily meals in the office and meal delivery credits
  • Annual learning and development stipend
  • Relocation support for eligible employees

About the Company

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI capabilities and seek to safely deploy them to the world through our products.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Software Engineer, Frontier Clusters Infrastructure

OpenAI · San Francisco

Sign up to apply