
Posted a day ago
Senior Lead Site Reliability Engineer
JPMorgan ChaseSenior Lead Site Reliability Engineer - AI/ML and Data Platforms
Requirements
5+ years applied SRE experience, Advanced understanding of SLI/SLO/SLA, Experience with observability tools (Grafana, Prometheus, Splunk), Knowledge of distributed systems and system design, Experience with AI capabilities in reliability workflows
Skills
PythonAWSKubernetesTerraformDatabricksSpark
About the role
Responsibilities
- Define non-functional requirements (NFRs) and availability targets for large-scale data platforms and AI/ML workloads.
- Create and deliver high-quality designs, roadmaps, and program charters for distributed systems initiatives.
- Implement observability and reliability designs to ensure robust, stable, and scalable analytics environments.
- Lead the adoption of AI-assisted reliability workflows across the SDLC, including testing, validation, and production readiness.
- Use enterprise-authorized AI capabilities to accelerate incident analysis and operational decisioning.
- Mentor technologists and serve as a site reliability adoption champion within the engineering community.
Requirements
- 5+ years of applied Site Reliability Engineering (SRE) experience.
- Advanced understanding of SRE principles, including SLI, SLO, SLA, and error budgets.
- Extensive experience with observability tools such as Grafana, Prometheus, Splunk, Dynatrace, or Datadog.
- Demonstrated experience using AI capabilities to improve reliability engineering workflows.
- Strong knowledge of distributed systems, system design, resiliency, and disaster recovery.
- Ability to communicate complex data-based solutions and collaborate effectively across cross-functional teams.
Preferred Qualifications
- Experience with AWS platforms and managed data platforms like Databricks.
- Experience building and managing data pipelines using Spark or similar distributed compute frameworks.
- Knowledge of containerization and orchestration tools such as Docker and Kubernetes.
- Proficiency in Python or other programming languages for automation and platform development.
- Experience with CI/CD pipelines, automation frameworks, and Infrastructure as Code (e.g., Terraform).
About the Company
JPMorgan Chase is a leading global financial institution providing innovative solutions to millions of consumers, small businesses, and prominent corporate and government clients. We leverage cutting-edge technology to drive excellence in investment banking, asset management, and consumer banking.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSenior Lead Site Reliability Engineer
JPMorgan Chase · Jersey City
