
Posted 18 hours ago
Senior Lead Site Reliability Engineer - AI/ML and Data Platforms
JPMorgan Chase
Requirements
5+ years applied SRE experience, Advanced understanding of SLI/SLO/SLA, Experience with observability tools (Grafana, Prometheus, Splunk), Knowledge of distributed systems and system design, Experience with AI-assisted reliability workflows
Skills
PythonAWSKubernetesTerraformDatabricksSpark
About the role
Responsibilities
- Define non-functional requirements (NFRs) and availability targets for large-scale data platforms and AI/ML workloads.
- Create and deliver high-quality designs, roadmaps, and program charters for distributed systems initiatives.
- Implement observability and reliability designs to ensure robust, stable, and scalable analytics environments.
- Lead the adoption of AI-assisted reliability workflows across the SDLC, including testing, validation, and production readiness.
- Use enterprise-authorized AI capabilities to accelerate incident analysis and operational decisioning.
- Mentor technologists and serve as a site reliability adoption champion within the engineering community.
Requirements
- 5+ years of applied Site Reliability Engineering (SRE) experience.
- Advanced understanding of SRE principles, including SLI, SLO, SLA, and error budgets.
- Extensive experience with observability tools such as Grafana, Prometheus, Splunk, Dynatrace, or Datadog.
- Demonstrated experience using AI capabilities to improve reliability engineering workflows.
- Strong knowledge of distributed systems, system design, resiliency, and disaster recovery.
- Ability to communicate complex data-based solutions and collaborate effectively across cross-functional teams.
Preferred Qualifications
- Experience with AWS platforms and managed data platforms like Databricks.
- Experience building and managing data pipelines using Spark or similar distributed compute frameworks.
- Knowledge of containerization (Docker, Kubernetes) and orchestration frameworks.
- Proficiency in Python or similar programming languages for automation and platform development.
- Experience with CI/CD pipelines, automation frameworks, and Infrastructure as Code (e.g., Terraform).
Benefits
- Competitive total rewards package including base salary and discretionary incentive compensation.
- Comprehensive health care coverage.
- Retirement savings plans.
- Tuition reimbursement and mental health support.
- Financial coaching and on-site health and wellness centers.
About the Company
JPMorgan Chase is a leading global financial institution providing innovative solutions to millions of consumers, small businesses, and prominent corporate and government clients. We are committed to diversity and inclusion, leveraging the diverse talents of our global workforce to drive success across investment banking, consumer banking, and asset management.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSenior Lead Site Reliability Engineer - AI/ML and Data Platforms
JPMorgan Chase · Jersey City
