
Posted a day ago
Principal Engineer, Inference Cloud
Cerebras SystemsPrincipal Engineer, Inference Cloud
Requirements
10+ years software engineering experience, Expertise in distributed systems architecture, Experience with large-scale cloud infrastructure, Proficiency in Go, C++, or Python, Experience with high-QPS systems, Knowledge of observability and reliability practices
Skills
Distributed SystemsCloud InfrastructureGo
About the role
Responsibilities
- Identify and prioritize critical technical problems for the Inference Cloud Platform, making explicit tradeoff decisions regarding platform support and evolution.
- Set the long-term technical direction for the platform, including multi-region topology, failure domains, and service boundaries.
- Architect highly available, active-active systems featuring rapid failover and graceful degradation mechanisms like circuit breaking and load shedding.
- Drive continuous improvements in latency, throughput, capacity efficiency, and resilience under unpredictable, bursty AI workloads.
- Contribute production code to critical paths and lead design and code reviews to ensure high engineering standards.
- Lead incident response, observability, and capacity planning to maintain high operational rigor and reliability.
- Mentor engineers and influence technical strategy across adjacent teams regarding API design, deployment strategy, and shared infrastructure.
Requirements
- 10+ years of software engineering experience, with a focus on building and operating large-scale distributed systems or cloud infrastructure.
- Deep expertise in distributed systems architecture, including networking, compute orchestration, and multi-region production services.
- Proven track record of making sound architectural decisions for highly available, latency-sensitive systems at scale.
- Experience optimizing performance in high-QPS systems, specifically regarding latency and throughput.
- Strong proficiency in backend or systems programming languages such as Go, C++, or Python.
- Experience designing and implementing observability and reliability practices (metrics, logging, tracing, and SLI/SLO-driven operations).
- Ability to influence senior technical leadership and cross-functional partners through technical credibility and judgment.
Preferred Qualifications
- Experience with ML inference infrastructure, model serving systems, or GPU-accelerated workloads.
- Specific experience with TTFT (Time To First Token) and tail-latency reduction.
About the Company
Cerebras Systems builds the world's largest AI chip, providing the compute power of dozens of GPUs on a single wafer-scale architecture. Our technology delivers industry-leading training and inference speeds, empowering users to run large-scale ML applications effortlessly. Cerebras Inference offers the fastest Generative AI inference solution in the world, transforming the user experience for top model labs and global enterprises.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freePrincipal Engineer, Inference Cloud
Cerebras Systems · Sunnyvale
