Requirements

Bachelor's or Master's in Engineering, CS, or Data Science, 8+ years in reliability engineering or data science, Strong Python and SQL, Experience with predictive models in production, Expertise in applied statistics and probabilistic modeling

Skills

PythonSQLData ScienceStatistics

About the role

Responsibilities

Define the vision, architecture, and roadmap for Prognostics and Health Monitoring (PHM) across deployed systems
Design and scale frameworks for health assessment, anomaly detection, and predictive failure modeling
Develop and productionize probabilistic models for failure risk, degradation, and remaining useful life
Analyze large-scale telemetry, logs, and service data to identify systemic drivers of failures and disruptions
Establish health metrics, scoring systems, and fleet-level observability to communicate system risk
Partner with system software to integrate monitoring, alerting, and automated mitigation into production
Drive closed-loop systems covering detection, diagnosis, action, and validation
Influence hardware design, qualification, and operations through data-driven insights

Requirements

Bachelor’s or Master’s in Engineering, Computer Science, Data Science, or a related field
8+ years of experience in reliability engineering, data science, fleet analytics, or a similar domain
Strong proficiency in Python and SQL for large-scale data analysis and modeling
Proven experience building and deploying predictive models in a production environment
Expertise in applied statistics and probabilistic modeling (e.g., survival analysis, hazard models, Bayesian methods)
Experience working with large-scale telemetry or distributed system datasets
Ability to define ambiguous problems and deliver scalable solutions

Preferred Qualifications

Experience with HPC systems, AI infrastructure, or datacenter environments
Background in PHM, predictive maintenance, or reliability analytics at scale
Familiarity with Remaining Useful Life (RUL) estimation and degradation modeling
Understanding of observability systems, telemetry pipelines, and real-time monitoring
Background in hardware reliability and failure modes in complex systems

About the Company

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, empowering machine learning users to effortlessly run large-scale ML applications. Our technology is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via ultra high-speed inference.

Prognostics & Health Monitoring Engineer

Requirements

Skills

About the role

Responsibilities

Requirements

Preferred Qualifications

About the Company

Get matches like this delivered daily

Prognostics & Health Monitoring Engineer