Senior Lead Site Reliability Engineer - AI/ML and Data Platforms at JPMorgan Chase - ScoutJobs - The AI-curated global job board
Skip to content
JPMorgan Chase
Posted 18 hours ago

Senior Lead Site Reliability Engineer - AI/ML and Data Platforms

JPMorgan Chase

Requirements

5+ years applied SRE experience, Advanced understanding of SLI/SLO/SLA, Experience with observability tools (Grafana, Prometheus, Splunk), Knowledge of distributed systems and system design, Experience with AI-assisted reliability workflows

Skills

PythonAWSKubernetesTerraformDatabricksSpark

About the role

Responsibilities

  • Define non-functional requirements (NFRs) and availability targets for large-scale data platforms and AI/ML workloads.
  • Create and deliver high-quality designs, roadmaps, and program charters for distributed systems initiatives.
  • Implement observability and reliability designs to ensure robust, stable, and scalable analytics environments.
  • Lead the adoption of AI-assisted reliability workflows across the SDLC, including testing, validation, and production readiness.
  • Use enterprise-authorized AI capabilities to accelerate incident analysis and operational decisioning.
  • Mentor technologists and serve as a site reliability adoption champion within the engineering community.

Requirements

  • 5+ years of applied Site Reliability Engineering (SRE) experience.
  • Advanced understanding of SRE principles, including SLI, SLO, SLA, and error budgets.
  • Extensive experience with observability tools such as Grafana, Prometheus, Splunk, Dynatrace, or Datadog.
  • Demonstrated experience using AI capabilities to improve reliability engineering workflows.
  • Strong knowledge of distributed systems, system design, resiliency, and disaster recovery.
  • Ability to communicate complex data-based solutions and collaborate effectively across cross-functional teams.

Preferred Qualifications

  • Experience with AWS platforms and managed data platforms like Databricks.
  • Experience building and managing data pipelines using Spark or similar distributed compute frameworks.
  • Knowledge of containerization (Docker, Kubernetes) and orchestration frameworks.
  • Proficiency in Python or similar programming languages for automation and platform development.
  • Experience with CI/CD pipelines, automation frameworks, and Infrastructure as Code (e.g., Terraform).

Benefits

  • Competitive total rewards package including base salary and discretionary incentive compensation.
  • Comprehensive health care coverage.
  • Retirement savings plans.
  • Tuition reimbursement and mental health support.
  • Financial coaching and on-site health and wellness centers.

About the Company

JPMorgan Chase is a leading global financial institution providing innovative solutions to millions of consumers, small businesses, and prominent corporate and government clients. We are committed to diversity and inclusion, leveraging the diverse talents of our global workforce to drive success across investment banking, consumer banking, and asset management.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Senior Lead Site Reliability Engineer - AI/ML and Data Platforms

JPMorgan Chase · Jersey City

Sign up to apply