Staff Software Engineer - AI Agent Evaluations at ID.me - ScoutJobs - The AI-curated global job board
Skip to content
ID.me
Posted 10 hours ago

Staff Software Engineer - AI Agent Evaluations

ID.meStaff Software Engineer- AI Agent Evaluations

Perks & benefits

Medical InsuranceHealth Insurance

Requirements

Bachelor's degree in Computer Science or equivalent, 8+ years building production software, Experience evaluating LLM-powered features, Proficiency in Python, Java, or Go, Experience with agentic frameworks, Experience with CI/CD and test infrastructure

Skills

PythonLLMRAG

About the role

Responsibilities

  • Define and lead the discipline of testing AI agents, evaluating LLM behavior, and ensuring the reliability of agentic systems in production.
  • Design and maintain evaluation pipelines for LLM outputs, agent behavior, tool use, and multi-turn interactions.
  • Build internal developer tooling and testing workflows to accelerate the development of AI features.
  • Instrument agentic systems for observability, monitoring for behavioral drift, hallucination rates, and policy adherence.
  • Lead agentic test strategies, including red-teaming, golden dataset construction, and LLM-as-judge pipelines.
  • Partner with Security, Platform, and Product teams to embed quality gates into agent development workflows.
  • Mentor senior and mid-level engineers on evaluation design and AI testing best practices.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or equivalent experience.
  • 8+ years of experience building and operating production software systems.
  • Demonstrated experience evaluating or testing LLM-powered features or autonomous agents in production.
  • Proficiency in Python, Java, or Go.
  • Experience with agentic frameworks (e.g., LangChain, LangGraph, CrewAI, or Anthropic SDK).
  • Experience designing test infrastructure, CI/CD quality gates, or evaluation pipelines at scale.
  • Experience with AI-assisted development tools (e.g., Claude Code, Cursor).

Preferred Qualifications

  • Background in identity verification, fraud detection, or regulated industries.
  • Familiarity with Anthropic's model evaluation methodology or similar research.
  • Experience with observability tooling (e.g., Datadog, OpenTelemetry) applied to AI workloads.
  • Proven track record of building developer platforms or tooling adopted widely across organizations.

About the Company

ID.me is a next-generation digital identity wallet that simplifies how individuals securely prove their identity online. With over 152 million users, ID.me provides streamlined identity verification for federal agencies, state governments, healthcare organizations, and hundreds of consumer brands. We are committed to the mission of "No Identity Left Behind," ensuring everyone has access to a secure digital identity.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Staff Software Engineer - AI Agent Evaluations

ID.me · Mountain View

Sign up to apply