
Posted 7 hours ago
LLM Reliability & Evaluation Engineer
Xenonstack Private LimitedLLM Reliability & Evaluation Engineer
Requirements
3–6 years in AI/ML, NLP, or model evaluation, Understanding of LLM architectures and prompt engineering, Hands-on with Ragas, OpenAI Evals, or DeepEval, Proficiency in Python, Experience with LangChain, LangGraph, or LlamaIndex, Experience with vector databases and RAG pipelines
Skills
PythonLLMNLPLangChainRAG
About the role
Responsibilities
- Design and implement LLM evaluation pipelines covering accuracy, robustness, safety, and bias
- Develop automated systems for benchmarking models on enterprise-relevant tasks
- Conduct stress tests, adversarial testing, and edge-case evaluations
- Build tools to measure latency, consistency, and error recovery in multi-turn interactions
- Define KPIs such as factual accuracy, hallucination rate, toxicity, and compliance alignment
- Establish real-time monitoring for drift, anomalies, and performance regressions
- Partner with ML engineers and product managers to align evaluation with business objectives
- Feed evaluation insights into fine-tuning, RLHF/RLAIF pipelines, and model selection
Requirements
- 3–6 years of experience in AI/ML, NLP, or applied model evaluation
- Strong understanding of LLM architectures, prompt engineering, and failure modes
- Hands-on experience with evaluation frameworks such as Ragas, OpenAI Evals, or DeepEval
- Proficiency in Python and libraries including LangChain, LangGraph, LlamaIndex, or Hugging Face
- Experience with vector databases, RAG pipelines, and knowledge graph integration
- Familiarity with bias/fairness testing and Responsible AI frameworks
Preferred Qualifications
- Experience with reinforcement learning (RLHF, RLAIF) and reward modeling
- Exposure to agentic evaluation frameworks and multi-agent stress testing
- Knowledge of compliance and safety requirements for BFSI, GRC, or SOC use cases
- Contributions to open-source evaluation libraries or research papers
About the Company
XenonStack is a fast-growing Data and AI Foundry for Agentic Systems. We enable enterprises to gain real-time and intelligent business insights by making AI agents reliable, explainable, and enterprise-ready. Our mission is to accelerate the world’s transition to AI + Human Intelligence through cutting-edge platforms in Vision AI and Inference AI infrastructure.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeLLM Reliability & Evaluation Engineer
Xenonstack Private Limited · Mohali
