
Posted 18 hours ago
AI Evaluation Engineer
FNZAI Evaluation Engineer
Requirements
3-6 years in software testing, QA, AI/ML, or data science, Hands-on test automation skills, Experience evaluating LLM applications or RAG systems, Understanding of prompt engineering and agent architectures, Analytical mindset
Skills
AILLMRAGPythonQuality Assurance
About the role
Responsibilities
- Design and conduct evaluations covering Task Performance, Safety, Efficiency, Groundedness, Robustness, and Suitability
- Create "golden sets" of test examples representing expert judgment on desired agent behavior
- Develop evaluation rubrics and scoring criteria aligned to FNZ Evaluation Framework principles
- Build comprehensive test suites covering happy paths, edge cases, and adversarial inputs
- Evaluate multi-step agentic workflows including planning, tool selection, execution, and error handling
- Assess agent groundedness by verifying outputs against knowledge bases and detecting hallucinations
- Document findings with clear evidence and collaborate with development teams on remediation
- Contribute to the development of an automated evaluation platform and CI/CD integration
Requirements
- 3-6 years of experience in software testing, QA engineering, AI/ML development, or data science
- Hands-on test automation skills, with experience in ML frameworks being highly valuable
- Practical experience evaluating LLM applications, RAG systems, or AI agents
- Strong understanding of prompt engineering, retrieval-augmented generation, and agent architectures
- Analytical mindset with the ability to decompose complex agent behaviors and identify failure modes
- Excellent documentation and presentation skills
About the Company
FNZ is committed to opening up wealth so that everyone, everywhere can invest in their future on their terms. We provide a global, end-to-end wealth management platform that integrates modern technology with business and investment operations within a regulated financial institution. We partner with the world’s leading financial institutions, managing over US$2.4 trillion in assets on platform.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeAI Evaluation Engineer
FNZ · Pune
