
Posted 6 hours ago
Agentic Infrastructure Observability Engineer
Xenonstack Private Limited
Requirements
3–6 years SRE, DevOps, or Observability experience, Knowledge of Prometheus, Grafana, ELK, OpenTelemetry, Jaeger, Cloud-native infrastructure (AWS, GCP, Azure) and Kubernetes, Proficiency in Python, Go, or Bash, Understanding of AI/LLM pipelines and RAG systems, Hands-on with CI/CD and monitoring-as-code
Skills
PrometheusGrafanaKubernetesPythonOpenTelemetryAWS
About the role
Responsibilities
- Design and implement end-to-end observability pipelines covering metrics, logs, traces, and cost telemetry for agentic systems.
- Build dashboards and alerting systems to monitor reliability, performance, and drift in real-time.
- Track LLM usage, context windows, token allocation, and multi-agent interactions.
- Build monitoring hooks into LangChain, LangGraph, MCP, and RAG pipelines.
- Define and monitor SLOs, SLIs, and SLAs for agentic workflows and inference infrastructure.
- Conduct root cause analysis of agent failures, latency issues, and cost spikes.
- Integrate observability into CI/CD and AgentOps pipelines.
- Develop custom plugins and scripts to extend observability for LLMs and data pipelines.
Requirements
- 3–6 years of experience in SRE, DevOps, or Observability Engineering.
- Strong knowledge of observability tools such as Prometheus, Grafana, ELK, OpenTelemetry, and Jaeger.
- Experience with cloud-native infrastructure (AWS, GCP, or Azure) and Kubernetes monitoring.
- Proficiency in Python, Go, or Bash for scripting and automation.
- Understanding of AI/LLM pipelines, RAG systems, and vector databases.
- Hands-on experience with CI/CD pipelines and monitoring-as-code.
Preferred Qualifications
- Experience with AgentOps tools like LangSmith, PromptLayer, Arize AI, or Weights & Biases.
- Exposure to AI-specific observability including token usage, model latency, and hallucination tracking.
- Knowledge of Responsible AI monitoring frameworks.
- Background working in regulated industries such as BFSI or GRC.
About the Company
Xenonstack is a fast-growing Data and AI Foundry for Agentic Systems. We enable enterprises to gain real-time and intelligent business insights through our specialized platforms for AI Agents, Vision AI, and Inference AI Infrastructure. Our mission is to accelerate the transition to AI + Human Intelligence by building platforms that are scalable, reliable, and observable by design.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeAgentic Infrastructure Observability Engineer
Xenonstack Private Limited · Mohali
