LLM Reliability & Evaluation Engineer at Xenonstack Private Limited - ScoutJobs - The AI-curated global job board
Skip to content
Xenonstack Private Limited
Posted 7 hours ago

LLM Reliability & Evaluation Engineer

Xenonstack Private LimitedLLM Reliability & Evaluation Engineer

Requirements

3–6 years in AI/ML, NLP, or model evaluation, Understanding of LLM architectures and prompt engineering, Hands-on with Ragas, OpenAI Evals, or DeepEval, Proficiency in Python, Experience with LangChain, LangGraph, or LlamaIndex, Experience with vector databases and RAG pipelines

Skills

PythonLLMNLPLangChainRAG

About the role

Responsibilities

  • Design and implement LLM evaluation pipelines covering accuracy, robustness, safety, and bias
  • Develop automated systems for benchmarking models on enterprise-relevant tasks
  • Conduct stress tests, adversarial testing, and edge-case evaluations
  • Build tools to measure latency, consistency, and error recovery in multi-turn interactions
  • Define KPIs such as factual accuracy, hallucination rate, toxicity, and compliance alignment
  • Establish real-time monitoring for drift, anomalies, and performance regressions
  • Partner with ML engineers and product managers to align evaluation with business objectives
  • Feed evaluation insights into fine-tuning, RLHF/RLAIF pipelines, and model selection

Requirements

  • 3–6 years of experience in AI/ML, NLP, or applied model evaluation
  • Strong understanding of LLM architectures, prompt engineering, and failure modes
  • Hands-on experience with evaluation frameworks such as Ragas, OpenAI Evals, or DeepEval
  • Proficiency in Python and libraries including LangChain, LangGraph, LlamaIndex, or Hugging Face
  • Experience with vector databases, RAG pipelines, and knowledge graph integration
  • Familiarity with bias/fairness testing and Responsible AI frameworks

Preferred Qualifications

  • Experience with reinforcement learning (RLHF, RLAIF) and reward modeling
  • Exposure to agentic evaluation frameworks and multi-agent stress testing
  • Knowledge of compliance and safety requirements for BFSI, GRC, or SOC use cases
  • Contributions to open-source evaluation libraries or research papers

About the Company

XenonStack is a fast-growing Data and AI Foundry for Agentic Systems. We enable enterprises to gain real-time and intelligent business insights by making AI agents reliable, explainable, and enterprise-ready. Our mission is to accelerate the world’s transition to AI + Human Intelligence through cutting-edge platforms in Vision AI and Inference AI infrastructure.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

LLM Reliability & Evaluation Engineer

Xenonstack Private Limited · Mohali

Sign up to apply