Agentic Infrastructure Observability Engineer at Xenonstack Private Limited - ScoutJobs - The AI-curated global job board
Skip to content
Xenonstack Private Limited
Posted 6 hours ago

Agentic Infrastructure Observability Engineer

Xenonstack Private Limited

Requirements

3–6 years SRE, DevOps, or Observability experience, Knowledge of Prometheus, Grafana, ELK, OpenTelemetry, Jaeger, Cloud-native infrastructure (AWS, GCP, Azure) and Kubernetes, Proficiency in Python, Go, or Bash, Understanding of AI/LLM pipelines and RAG systems, Hands-on with CI/CD and monitoring-as-code

Skills

PrometheusGrafanaKubernetesPythonOpenTelemetryAWS

About the role

Responsibilities

  • Design and implement end-to-end observability pipelines covering metrics, logs, traces, and cost telemetry for agentic systems.
  • Build dashboards and alerting systems to monitor reliability, performance, and drift in real-time.
  • Track LLM usage, context windows, token allocation, and multi-agent interactions.
  • Build monitoring hooks into LangChain, LangGraph, MCP, and RAG pipelines.
  • Define and monitor SLOs, SLIs, and SLAs for agentic workflows and inference infrastructure.
  • Conduct root cause analysis of agent failures, latency issues, and cost spikes.
  • Integrate observability into CI/CD and AgentOps pipelines.
  • Develop custom plugins and scripts to extend observability for LLMs and data pipelines.

Requirements

  • 3–6 years of experience in SRE, DevOps, or Observability Engineering.
  • Strong knowledge of observability tools such as Prometheus, Grafana, ELK, OpenTelemetry, and Jaeger.
  • Experience with cloud-native infrastructure (AWS, GCP, or Azure) and Kubernetes monitoring.
  • Proficiency in Python, Go, or Bash for scripting and automation.
  • Understanding of AI/LLM pipelines, RAG systems, and vector databases.
  • Hands-on experience with CI/CD pipelines and monitoring-as-code.

Preferred Qualifications

  • Experience with AgentOps tools like LangSmith, PromptLayer, Arize AI, or Weights & Biases.
  • Exposure to AI-specific observability including token usage, model latency, and hallucination tracking.
  • Knowledge of Responsible AI monitoring frameworks.
  • Background working in regulated industries such as BFSI or GRC.

About the Company

Xenonstack is a fast-growing Data and AI Foundry for Agentic Systems. We enable enterprises to gain real-time and intelligent business insights through our specialized platforms for AI Agents, Vision AI, and Inference AI Infrastructure. Our mission is to accelerate the transition to AI + Human Intelligence by building platforms that are scalable, reliable, and observable by design.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Agentic Infrastructure Observability Engineer

Xenonstack Private Limited · Mohali

Sign up to apply