
Posted 4 days ago
ML Infrastructure Service Reliability Engineer
AppleML Infrastructure Service Reliability Engineer
Requirements
5+ years experience in cloud scaling, Deep expertise in Kubernetes, Proficiency in Python, Go, or Rust, Experience with Amazon S3 or GCS, Strong networking troubleshooting skills, Understanding of Linux internals
Skills
KubernetesPythonGoRustAWSGCPLinux
About the role
Responsibilities
- Participate in a rotating on-call schedule, including occasional weekend coverage
- Manage and scale Apple’s largest ML compute platform and multi-cloud storage abstraction
- Oversee the full infrastructure stack from low-level nodes to complete network architecture
- Leverage a diverse stack of open-source tools, commercial solutions, and internal systems
- Drive automation and operational efficiency to ensure high availability and resilience
Requirements
- 5+ years of experience building, operating, and scaling large applications in cloud environments
- Deep expertise in Kubernetes, including hands-on experience with GKE or EKS
- Proficiency in designing and developing code in Python, Go, or Rust
- Practical experience with object storage technologies such as Amazon S3 or Google Cloud Storage (GCS)
- Strong background in troubleshooting complex networking issues in public and private clouds
- Solid understanding of Linux internals, standard networking protocols, and distributed systems
Preferred Qualifications
- Proven drive to automate manual operations through continuous iteration
- Experience managing diverse system environments using tools like Spinnaker, Helm, or Flux
- Expertise in deploying, supporting, and monitoring large-scale distributed application stacks
- Strong understanding of best practices for deploying large-scale distributed applications
About the Company
Apple creates transformative experiences that reshape entire industries. The ML Infrastructure team is responsible for managing the critical machine learning training workloads that power user-facing features across the Apple ecosystem.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeML Infrastructure Service Reliability Engineer
Apple · Bengaluru
