Software Engineer, Infrastructure Reliability at OpenAI - ScoutJobs - The AI-curated global job board
Skip to content
OpenAI
Posted 3 days ago

Software Engineer, Infrastructure Reliability

OpenAISoftware Engineer, Infrastructure Reliability

Requirements

4+ years industry experience, 2+ years leading large scale projects, Proficiency in cloud infrastructure (AWS, GCP, or Azure), Experience with Kubernetes and Terraform, Experience with observability tools (Datadog, Prometheus, or Grafana), Knowledge of microservices and service mesh

Skills

KubernetesTerraformAWSDistributed SystemsLinux

About the role

Responsibilities

  • Design, build, and operate reliable and performant systems used across engineering
  • Identify and fix performance bottlenecks to ensure infrastructure scales to the next order of magnitude
  • Improve automation, internal tooling, and developer experience to reduce manual work
  • Contribute to incident response, postmortems, and the development of best practices for system reliability
  • Collaborate with infra, product, and research teams to turn complex infrastructure into reliable platforms

Requirements

  • 4+ years of relevant industry experience
  • 2+ years leading large-scale, complex projects or teams as an engineer or tech lead
  • Proficiency in cloud infrastructure (AWS, GCP, or Azure) and IaC tools like Terraform
  • Experience with containerization and orchestration platforms such as Kubernetes
  • Experience with observability tools (Datadog, Prometheus, Grafana, Splunk, or ELK stack)
  • Knowledge of microservices architecture and service mesh technologies
  • Strong understanding of distributed systems, networking, and database technologies

Benefits

  • Competitive salary ($255K – $405K) and generous equity
  • Medical, dental, and vision insurance for you and your family
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents)
  • Flexible PTO and 13+ paid company holidays
  • Daily meals in the office and meal delivery credits
  • Annual learning and development stipend
  • Relocation support for eligible employees

About the Company

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Software Engineer, Infrastructure Reliability

OpenAI · San Francisco

Sign up to apply