Requirements

BS, MS, or PhD in Computer Science or ML, 2+ years experience in ML evaluations or data curation, Strong Python and PyTorch development, Experience with LLM agents, Experience with RL or distributed ML systems

Skills

PythonPyTorchLLMReinforcement LearningMachine Learning

About the role

Responsibilities

Design and run evaluations of agentic capabilities, including multi-step reasoning, tool use, long-horizon planning, and safety properties.
Build and harden evaluation harnesses to ensure benchmarks run reliably at scale against training checkpoints.
Source, generate, and curate high-quality agentic training data, such as trajectories and tool-use traces.
Design and scale RL environments and reward signals to improve model performance.
Develop QA frameworks to detect reward hacking, label noise, and data contamination.
Partner with research and product teams to translate capability goals into measurable data and evaluation artifacts.

Requirements

BS, MS, or PhD in Computer Science, Machine Learning, or a related field.
2+ years of experience with a focus on ML evaluations or training-data curation.
Strong Python and PyTorch development skills.
Demonstrated experience designing evaluations or curating/generating training datasets.
Hands-on experience using LLM agents in professional or personal projects.

Preferred Qualifications

Experience with reinforcement learning (RL), reward design, or RL environment construction.
Background in statistics and experimental design.
Experience with large-scale dataset sourcing and managing external data vendors.
Knowledge of literature regarding agent evaluation, LLM reasoning, and tool use.
Experience building scalable data pipelines or evaluation infrastructure (e.g., Ray).
Contributions to published research, public benchmarks, or open-source ML software.

About the Company

The Institute of Foundation Models (IFM) is a dedicated research lab focused on building, understanding, using, and risk-managing foundation models. Our mission is to advance AI research, nurture the next generation of builders, and drive transformative contributions to a knowledge-driven economy.

Research Scientist, Agentic Data & Benchmarking

Requirements

Skills

About the role

Responsibilities

Requirements

Preferred Qualifications

About the Company

Get matches like this delivered daily

Research Scientist, Agentic Data & Benchmarking