Research Scientist, Agentic Data & Benchmarking at Institute of Foundation Models - ScoutJobs - The AI-curated global job board
Skip to content
Institute of Foundation Models
Posted 11 hours ago

Research Scientist, Agentic Data & Benchmarking

Institute of Foundation ModelsResearch Scientist, Agentic Data & Benchmarking

Requirements

BS, MS, or PhD in Computer Science or ML, 2+ years experience in ML evaluations or data curation, Strong Python and PyTorch development, Experience with LLM agents, Experience with RL or distributed ML systems

Skills

PythonPyTorchLLMReinforcement LearningMachine Learning

About the role

Responsibilities

  • Design and run evaluations of agentic capabilities, including multi-step reasoning, tool use, long-horizon planning, and safety properties.
  • Build and harden evaluation harnesses to ensure benchmarks run reliably at scale against training checkpoints.
  • Source, generate, and curate high-quality agentic training data, such as trajectories and tool-use traces.
  • Design and scale RL environments and reward signals to improve model performance.
  • Develop QA frameworks to detect reward hacking, label noise, and data contamination.
  • Partner with research and product teams to translate capability goals into measurable data and evaluation artifacts.

Requirements

  • BS, MS, or PhD in Computer Science, Machine Learning, or a related field.
  • 2+ years of experience with a focus on ML evaluations or training-data curation.
  • Strong Python and PyTorch development skills.
  • Demonstrated experience designing evaluations or curating/generating training datasets.
  • Hands-on experience using LLM agents in professional or personal projects.

Preferred Qualifications

  • Experience with reinforcement learning (RL), reward design, or RL environment construction.
  • Background in statistics and experimental design.
  • Experience with large-scale dataset sourcing and managing external data vendors.
  • Knowledge of literature regarding agent evaluation, LLM reasoning, and tool use.
  • Experience building scalable data pipelines or evaluation infrastructure (e.g., Ray).
  • Contributions to published research, public benchmarks, or open-source ML software.

About the Company

The Institute of Foundation Models (IFM) is a dedicated research lab focused on building, understanding, using, and risk-managing foundation models. Our mission is to advance AI research, nurture the next generation of builders, and drive transformative contributions to a knowledge-driven economy.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Research Scientist, Agentic Data & Benchmarking

Institute of Foundation Models · Sunnyvale

Sign up to apply