
Posted 4 days ago
Site Reliability Engineer 3
PhonePeSite Reliability Engineer 3
Perks & benefits
AccommodationMedical InsuranceMobile AllowancePaid LeaveRelocation Allowance
Requirements
7-12 years experience, Microsoft Azure expertise, Linux/Ubuntu proficiency, Python, Go, or Java, Terraform and Saltstack/Ansible, MySQL and Aerospike management, Prometheus and Grafana, Networking (BGP, IPsec, Express Route)
Skills
AzureTerraformPythonLinuxKubernetesPrometheusDocker
About the role
Responsibilities
- Manage, scale, and ensure high availability of core infrastructure within a high-volume Azure environment.
- Configure and maintain Ubuntu Virtual Machines, Azure Storage, CosmosDB, and Azure Data Explorer.
- Design and manage complex networking components including Azure Firewall, Route Tables, Virtual Network Gateways, and Express Route.
- Drive automation for all BAU tasks using Terraform, Saltstack, and Ansible.
- Set up and manage high-availability databases such as MySQL and Aerospike, including cross-region replication and migrations.
- Implement and manage monitoring and observability solutions using Prometheus, Victoria Metrics, Riemann, and Loki, with visualization in Grafana.
- Lead incident response, conduct Root Cause Analysis (RCA), and participate in an on-call rotation.
- Conduct proactive capacity planning and manage critical components like Nginx, HA Proxy, Docker, and RabbitMQ.
Requirements
- 7 to 12 years of experience in Site Reliability Engineering or Infrastructure Management.
- Deep, hands-on expertise with Microsoft Azure services and complex Azure networking (BGP, IPsec, Express Route).
- Expert proficiency in Linux environments, specifically Ubuntu/Linux.
- Strong programming skills in at least one high-level language: Python, Go, or Java.
- Mastery of Shell scripting (Bash) and Infrastructure as Code (Terraform).
- Extensive experience with configuration management tools like Saltstack or Ansible.
- Proven experience managing high-availability data stores (MySQL, Aerospike) and monitoring stacks (Prometheus, Grafana).
Benefits
- Comprehensive Insurance: Medical, Critical Illness, Accidental, and Life Insurance.
- Wellness Programs: Employee Assistance Program, Onsite Medical Center, and Emergency Support.
- Parental Support: Maternity, Paternity, Adoption, and Day-care support programs.
- Financial & Retirement: Employee PF, Flexible PF, Gratuity, NPS, and Leave Encashment.
- Additional Perks: Higher Education Assistance, Car Lease, and Relocation benefits.
About the Company
PhonePe is a leading digital payments platform in India, serving over 600 million registered users and 40 million merchants. We process over 330 million transactions daily and are expanding our portfolio into insurance, lending, wealth management, and hyperlocal e-commerce. At PhonePe, we empower our engineers to own their work from start to finish and solve complex problems at a massive scale.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSite Reliability Engineer 3
PhonePe · Bangalore
