
Posted a day ago
Prognostics & Health Monitoring Engineer
Cerebras SystemsPrognostics & Health Monitoring Engineer
Requirements
Bachelor's or Master's in Engineering, CS, or Data Science, 8+ years in reliability engineering or data science, Strong Python and SQL, Experience with predictive models in production, Expertise in applied statistics and probabilistic modeling
Skills
PythonSQLData ScienceStatistics
About the role
Responsibilities
- Define the vision, architecture, and roadmap for Prognostics and Health Monitoring (PHM) across deployed systems
- Design and scale frameworks for health assessment, anomaly detection, and predictive failure modeling
- Develop and productionize probabilistic models for failure risk, degradation, and remaining useful life
- Analyze large-scale telemetry, logs, and service data to identify systemic drivers of failures and disruptions
- Establish health metrics, scoring systems, and fleet-level observability to communicate system risk
- Partner with system software to integrate monitoring, alerting, and automated mitigation into production
- Drive closed-loop systems covering detection, diagnosis, action, and validation
- Influence hardware design, qualification, and operations through data-driven insights
Requirements
- Bachelor’s or Master’s in Engineering, Computer Science, Data Science, or a related field
- 8+ years of experience in reliability engineering, data science, fleet analytics, or a similar domain
- Strong proficiency in Python and SQL for large-scale data analysis and modeling
- Proven experience building and deploying predictive models in a production environment
- Expertise in applied statistics and probabilistic modeling (e.g., survival analysis, hazard models, Bayesian methods)
- Experience working with large-scale telemetry or distributed system datasets
- Ability to define ambiguous problems and deliver scalable solutions
Preferred Qualifications
- Experience with HPC systems, AI infrastructure, or datacenter environments
- Background in PHM, predictive maintenance, or reliability analytics at scale
- Familiarity with Remaining Useful Life (RUL) estimation and degradation modeling
- Understanding of observability systems, telemetry pipelines, and real-time monitoring
- Background in hardware reliability and failure modes in complex systems
About the Company
Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, empowering machine learning users to effortlessly run large-scale ML applications. Our technology is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via ultra high-speed inference.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freePrognostics & Health Monitoring Engineer
Cerebras Systems · Sunnyvale
