
Posted a day ago
Software Engineer, Platform Systems
OpenAISoftware Engineer, Platform Systems
Requirements
experience with low-level software, knowledge of hardware and operating systems, understanding of networking and concurrency, background in high-performance computing, distributed systems expertise
Skills
Distributed SystemsSystems engineering
About the role
Responsibilities
- Design and build distributed failure detection, tracing, and profiling systems for large-scale AI training jobs
- Develop tooling to identify slow, faulty, or misbehaving nodes to provide actionable visibility into system behavior
- Improve observability, reliability, and performance across OpenAI’s training platform
- Debug and resolve issues in complex, high-throughput distributed systems
- Collaborate with systems, infrastructure, and research teams to evolve platform capabilities
Requirements
- Experience writing low-level software where system details matter
- Deep understanding of hardware, operating systems, networking, and concurrency
- Expertise in distributed systems
- Background in high-performance computing or low-level systems engineering
- Strong interest in performance, stability, and observability in large-scale environments
About the Company
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSoftware Engineer, Platform Systems
OpenAI · London
