
Posted 10 hours ago
Staff Software Engineer - AI Agent Evaluations
ID.meStaff Software Engineer- AI Agent Evaluations
Perks & benefits
Medical InsuranceHealth Insurance
Requirements
Bachelor's degree in Computer Science or equivalent, 8+ years building production software, Experience evaluating LLM-powered features, Proficiency in Python, Java, or Go, Experience with agentic frameworks, Experience with CI/CD and test infrastructure
Skills
PythonLLMRAG
About the role
Responsibilities
- Define and lead the discipline of testing AI agents, evaluating LLM behavior, and ensuring the reliability of agentic systems in production.
- Design and maintain evaluation pipelines for LLM outputs, agent behavior, tool use, and multi-turn interactions.
- Build internal developer tooling and testing workflows to accelerate the development of AI features.
- Instrument agentic systems for observability, monitoring for behavioral drift, hallucination rates, and policy adherence.
- Lead agentic test strategies, including red-teaming, golden dataset construction, and LLM-as-judge pipelines.
- Partner with Security, Platform, and Product teams to embed quality gates into agent development workflows.
- Mentor senior and mid-level engineers on evaluation design and AI testing best practices.
Requirements
- Bachelor's degree in Computer Science, Engineering, or equivalent experience.
- 8+ years of experience building and operating production software systems.
- Demonstrated experience evaluating or testing LLM-powered features or autonomous agents in production.
- Proficiency in Python, Java, or Go.
- Experience with agentic frameworks (e.g., LangChain, LangGraph, CrewAI, or Anthropic SDK).
- Experience designing test infrastructure, CI/CD quality gates, or evaluation pipelines at scale.
- Experience with AI-assisted development tools (e.g., Claude Code, Cursor).
Preferred Qualifications
- Background in identity verification, fraud detection, or regulated industries.
- Familiarity with Anthropic's model evaluation methodology or similar research.
- Experience with observability tooling (e.g., Datadog, OpenTelemetry) applied to AI workloads.
- Proven track record of building developer platforms or tooling adopted widely across organizations.
About the Company
ID.me is a next-generation digital identity wallet that simplifies how individuals securely prove their identity online. With over 152 million users, ID.me provides streamlined identity verification for federal agencies, state governments, healthcare organizations, and hundreds of consumer brands. We are committed to the mission of "No Identity Left Behind," ensuring everyone has access to a secure digital identity.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeStaff Software Engineer - AI Agent Evaluations
ID.me · Mountain View
