
Posted 11 hours ago
Research Scientist, Agentic Data & Benchmarking
Institute of Foundation Models
Requirements
BS, MS, or PhD in Computer Science or Machine Learning, 2+ years experience in ML evaluations or data curation, Strong Python and PyTorch development skills, Experience with LLM agents, Experience with RL or distributed ML systems
Skills
PythonPyTorchLLMReinforcement LearningMachine Learning
About the role
Responsibilities
- Design and run evaluations of agentic capabilities, including multi-step reasoning, tool use, long-horizon planning, and safety properties.
- Build and harden evaluation harnesses to ensure benchmarks run reliably at scale against training checkpoints.
- Source, generate, and curate high-quality agentic training data, such as trajectories and tool-use traces.
- Design and scale RL environments and reward signals to improve model performance.
- Develop QA frameworks to detect reward hacking, label noise, and data contamination.
- Partner with research and product teams to translate capability goals into measurable data and evaluation artifacts.
- Contribute to technical reports, research publications, and open-source benchmarks and tooling.
Requirements
- BS, MS, or PhD in Computer Science, Machine Learning, or a related field.
- 2+ years of experience with a clear emphasis on ML evaluations or training-data curation.
- Strong Python and PyTorch development skills.
- Demonstrated experience designing evaluations or curating/generating training datasets.
- Hands-on experience using LLM agents in professional or personal projects.
Preferred Qualifications
- Experience with reinforcement learning (RL), reward design, or RL environment construction for LLMs.
- Background in statistics and experimental design, specifically regarding signal-to-noise and contamination.
- Experience with large-scale dataset sourcing and managing external data vendors.
- Experience building or operating scalable data pipelines and evaluation infrastructure (e.g., Ray).
- Experience evaluating or generating data for software-engineering or computer-use agents.
- Contributions to published research or open-source ML software.
About the Company
The Institute of Foundation Models (IFM) is a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeResearch Scientist, Agentic Data & Benchmarking
Institute of Foundation Models · Sunnyvale
