
Posted 3 days ago
Principal Site Reliability Engineer
AIFTPrincipal Site Reliability Engineer
Requirements
8+ years software/network/systems engineering, 6+ years large scale cloud services, 2+ years SRE leadership, Business level English fluency, Infrastructure planning and optimization, Budget and OKR planning, Monitoring solutions (Prometheus, Grafana, ELK), SDLC experience, Network security knowledge, GitLab CI/CD implementation, Ansible and Terraform proficiency, Kubernetes and Docker knowledge
Skills
KubernetesTerraformPythonGoAnsiblePrometheusDockerCI/CD
About the role
Responsibilities
- Lead the development, construction, and management of reliable, distributed systems and large-scale cloud services.
- Plan infrastructure upgrades and optimizations to support business operations.
- Manage cloud budgets and ensure expenses remain within allocated limits.
- Drive OKR planning to ensure technical key results align with company objectives.
- Implement and maintain robust monitoring solutions and CI/CD processes.
- Oversee the complete software development life cycle (SDLC) from a reliability perspective.
Requirements
- 8+ years of technical experience in software engineering, network engineering, or systems administration.
- 6+ years of experience operating large-scale cloud services.
- 2+ years of experience in an SRE team leadership role.
- Proficiency in programming languages such as Bash, Python, or Go.
- Hands-on experience with Ansible, Terraform, Kubernetes, and Docker.
- Advanced knowledge of monitoring tools including Prometheus, Grafana, and ELK stack.
- Experience implementing GitLab CI/CD and using Git version control.
- Strong understanding of network security and infrastructure automation.
- Business-level fluency in English.
Preferred Qualifications
- Experience with AI pair programming tools like OpenAI.
- Proven ability to work effectively in a team-oriented environment with strong interpersonal skills.
About the Company
AIFT is dedicated to building cutting-edge technology solutions, focusing on high-scale, reliable, and distributed systems to drive business innovation.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freePrincipal Site Reliability Engineer
AIFT · Taipei
