Perks & benefits

Medical InsuranceHealth InsurancePaid Leave

Requirements

Bachelor's degree in CS or related field, Experience in software engineering or AI/ML roles, Experience with AI model evaluation or error analysis, Proficiency in full-stack development, Knowledge of Python and web frameworks

Skills

PythonReactMachine LearningSQLTypeScript

About the role

Responsibilities

Build and improve Eval360, an evaluation service acting as a quality gate for model development and release decisions.
Perform deep error analysis on model outputs to identify failure patterns, categorize issues, and trace root causes.
Develop tools, dashboards, and interfaces that allow researchers to inspect model failures and compare behaviors.
Design and implement full-stack architecture, including client-side review systems and server-side evaluation pipelines.
Build and maintain back-end services, APIs, and data pipelines that support evaluation execution and results storage.
Collaborate with researchers, ML engineers, and data scientists to improve evaluation methodology and model diagnostics.
Contribute to the design of error taxonomies, evaluation rubrics, and quality thresholds.

Requirements

Bachelor's degree in Computer Science, Machine Learning, Data Science, or a related technical field.
Proven experience as a Software Engineer, Full Stack Developer, or AI/ML Evaluation Engineer.
Experience building software systems for AI, machine learning, data analysis, or model monitoring.
Experience performing error analysis, model evaluation, or failure-mode investigation for ML systems.
Proficiency in full-stack development, including front-end frameworks and back-end languages like Python.
Familiarity with databases (SQL/NoSQL), APIs, and CI/CD workflows.
Strong analytical judgment and the ability to translate qualitative model failures into structured insights.

Preferred Qualifications

Master's or Ph.D. in a relevant technical field.
Experience with large language models (LLMs), foundation models, or multimodal models.
Experience designing error taxonomies, benchmark datasets, or automated grading systems.
Proficiency with Python-based data analysis tools such as pandas, NumPy, or Jupyter.
Experience with distributed systems, job queues, or large-scale data processing.

Benefits

Comprehensive medical, dental, and vision benefits
Annual bonus
401K plan
Generous paid time off, sick leave, and holidays
Paid parental leave
Employee assistance program
Life and disability insurance

About the Company

The Institute of Foundation Models is a dedicated research lab focused on building, understanding, using, and risk-managing foundation models. Our mission is to advance AI research and develop impactful systems that improve how frontier models are trained, evaluated, deployed, and governed.

Error Analysis Engineer

Perks & benefits

Requirements

Skills

About the role

Responsibilities

Requirements

Preferred Qualifications

Benefits

About the Company

Get matches like this delivered daily

Error Analysis Engineer