Software Engineer, GPU Infrastructure - HPC at OpenAI - ScoutJobs - The AI-curated global job board
Skip to content
OpenAI
Posted a day ago

Software Engineer, GPU Infrastructure - HPC

OpenAISoftware Engineer, GPU Infrastructure - HPC

Requirements

large-scale server environment management, Python or Go proficiency, Linux and networking knowledge, server hardware knowledge, SQL, PromQL, or Pandas

Skills

PythonGoLinuxSQLPrometheusHPC

About the role

Responsibilities

  • Build and maintain automation systems for provisioning and managing large-scale server fleets
  • Develop tools to monitor server health, performance, and lifecycle events
  • Collaborate with clusters, networking, and infrastructure teams to ensure high availability
  • Partner with external operators to maintain high quality standards across the compute fleet
  • Identify and resolve performance bottlenecks and inefficiencies within the infrastructure
  • Continuously improve automation to reduce manual intervention and improve reliability

Requirements

  • Experience managing large-scale server environments
  • Proficiency in Python, Go, or similar programming languages
  • Strong knowledge of Linux, networking, and server hardware
  • Ability to analyze and dig into noisy data using SQL, PromQL, and Pandas

Preferred Qualifications

  • Experience with low-level hardware details (e.g., PCIe, Infiniband, power management, kernel perf tuning)
  • Knowledge of hardware management protocols such as IPMI or Redfish
  • Experience with High-Performance Computing (HPC) or distributed systems
  • Familiarity with monitoring tools like Prometheus and Grafana

Benefits

  • Competitive salary range of $230K – $490K plus equity
  • Comprehensive medical, dental, and vision insurance
  • 401(k) retirement plan with employer match
  • Flexible PTO and paid parental leave
  • Daily meals in the office and mental health support
  • Annual learning and development stipend

About the Company

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI capabilities and seek to safely deploy them to the world through products like ChatGPT.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Software Engineer, GPU Infrastructure - HPC

OpenAI · San Francisco

Sign up to apply