Requirements

10+ years software engineering experience, Expertise in distributed systems architecture, Experience with large-scale cloud infrastructure, Proficiency in Go, C++, or Python, Experience with high-QPS systems, Knowledge of observability and reliability practices

Skills

Distributed SystemsCloud InfrastructureGo

About the role

Responsibilities

Identify and prioritize critical technical problems for the Inference Cloud Platform, making explicit tradeoff decisions regarding platform support and evolution.
Set the long-term technical direction for the platform, including multi-region topology, failure domains, and service boundaries.
Architect highly available, active-active systems featuring rapid failover and graceful degradation mechanisms like circuit breaking and load shedding.
Drive continuous improvements in latency, throughput, capacity efficiency, and resilience under unpredictable, bursty AI workloads.
Contribute production code to critical paths and lead design and code reviews to ensure high engineering standards.
Lead incident response, observability, and capacity planning to maintain high operational rigor and reliability.
Mentor engineers and influence technical strategy across adjacent teams regarding API design, deployment strategy, and shared infrastructure.

Requirements

10+ years of software engineering experience, with a focus on building and operating large-scale distributed systems or cloud infrastructure.
Deep expertise in distributed systems architecture, including networking, compute orchestration, and multi-region production services.
Proven track record of making sound architectural decisions for highly available, latency-sensitive systems at scale.
Experience optimizing performance in high-QPS systems, specifically regarding latency and throughput.
Strong proficiency in backend or systems programming languages such as Go, C++, or Python.
Experience designing and implementing observability and reliability practices (metrics, logging, tracing, and SLI/SLO-driven operations).
Ability to influence senior technical leadership and cross-functional partners through technical credibility and judgment.

Preferred Qualifications

Experience with ML inference infrastructure, model serving systems, or GPU-accelerated workloads.
Specific experience with TTFT (Time To First Token) and tail-latency reduction.

About the Company

Cerebras Systems builds the world's largest AI chip, providing the compute power of dozens of GPUs on a single wafer-scale architecture. Our technology delivers industry-leading training and inference speeds, empowering users to run large-scale ML applications effortlessly. Cerebras Inference offers the fastest Generative AI inference solution in the world, transforming the user experience for top model labs and global enterprises.

Principal Engineer, Inference Cloud

Requirements

Skills

About the role

Responsibilities

Requirements

Preferred Qualifications

About the Company

Get matches like this delivered daily

Principal Engineer, Inference Cloud