Software Engineer, Reliability at OpenAI - ScoutJobs - The AI-curated global job board
Skip to content
OpenAI
Posted 3 days ago

Software Engineer, Reliability

OpenAISoftware Engineer, Reliability

Requirements

Bachelor's degree in Computer Science or related field, Experience as SWE focused on reliability, Proficiency in cloud infrastructure, Proficiency in programming languages, Experience with Kubernetes, Knowledge of Terraform or CloudFormation, Experience with DataDog, Prometheus, Grafana, or Splunk, Experience with microservices architecture

Skills

KubernetesTerraformMicroservices

About the role

Responsibilities

  • Design and implement solutions to ensure infrastructure scalability to meet rapidly increasing demands
  • Build and maintain load, chaos, and synthetic testing software to improve system reliability
  • Develop automation tools to streamline repetitive tasks and improve system efficiency
  • Maintain the platform for CPU, storage, GPU, and network lifecycle management
  • Implement fault-tolerant and resilient design patterns to minimize service disruptions
  • Develop and maintain service level objectives (SLOs) and service level indicators (SLIs)
  • Partner with researchers, engineers, and product managers to bring new features to the world
  • Participate in an on-call rotation to respond to critical incidents

Requirements

  • Bachelor's degree in Computer Science, Information Technology, or a related field
  • Proven experience as a Software Engineer focused on reliability in a fast-paced, scaling environment
  • Strong proficiency in cloud infrastructure and programming languages
  • Experience with containerization and orchestration platforms like Kubernetes
  • Knowledge of Infrastructure as Code (IaC) tools such as Terraform or CloudFormation
  • Experience with observability tools like DataDog, Prometheus, Grafana, or Splunk
  • Experience with microservices architecture and service mesh technologies

Benefits

  • Competitive salary range of $230K – $490K plus equity
  • Medical, dental, and vision insurance for you and your family
  • 401(k) retirement plan with employer match
  • Paid parental leave and paid medical/caregiver leave
  • Flexible PTO and 13+ paid company holidays
  • Mental health and wellness support
  • Daily meals in the office and relocation support for eligible employees

About the Company

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of AI capabilities and seek to safely deploy them to the world through our products.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Software Engineer, Reliability

OpenAI · San Francisco

Sign up to apply