Software Engineer, Fleet Hardware Health at OpenAI - ScoutJobs - The AI-curated global job board
Skip to content
OpenAI
Posted a day ago

Software Engineer, Fleet Hardware Health

OpenAISoftware Engineer, Fleet Hardware Health

Requirements

Experience managing large-scale server environments, Proficiency in Python, Go, or similar languages, Strong Linux, networking, and server hardware knowledge, Data analysis with SQL, PromQL, and Pandas

Skills

PythonGoLinuxSQLPrometheusGrafana

About the role

Responsibilities

  • Build and maintain automation systems for provisioning and managing server fleets
  • Develop tools to monitor server health, performance, and lifecycle events
  • Collaborate with clusters, networking, and infrastructure teams to ensure high availability
  • Partner with external operators to maintain high quality standards
  • Identify and resolve performance bottlenecks and inefficiencies
  • Continuously improve automation to reduce manual operational work

Requirements

  • Experience managing large-scale server environments
  • Proficiency in Python, Go, or similar programming languages
  • Strong knowledge of Linux, networking, and server hardware
  • Ability to perform data analysis using SQL, PromQL, and Pandas

Preferred Qualifications

  • Experience with low-level hardware details (PCIe, Infiniband, power management, kernel perf tuning)
  • Knowledge of hardware management protocols such as IPMI or Redfish
  • Experience with High-Performance Computing (HPC) or distributed systems
  • Familiarity with monitoring tools like Prometheus and Grafana

Benefits

  • Competitive salary range of $230K – $490K plus equity
  • Comprehensive medical, dental, and vision insurance
  • 401(k) retirement plan with employer match
  • Flexible PTO and paid parental leave
  • Daily meals in the office and mental health support
  • Annual learning and development stipend

About the Company

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI capabilities and seek to safely deploy them to the world through products like ChatGPT.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Software Engineer, Fleet Hardware Health

OpenAI · San Francisco

Sign up to apply