Site Reliability Engineer at ByteDance - ScoutJobs - The AI-curated global job board
Skip to content
ByteDance
Posted 21 hours ago

Site Reliability Engineer

ByteDanceSite Reliability Engineer, System - System Service Global

Requirements

Bachelor's degree in CS or related field, Large-scale Linux host management experience, Knowledge of DNS, NTP, DHCP, NAT, and Kerberos, Proficiency with Ansible, Salt, or Puppet, Understanding of SRE principles and SLO/SLI, High availability and disaster recovery design

Skills

LinuxDNSAnsiblePythonGoBashDevOps

About the role

Responsibilities

  • Manage and maintain large-scale host infrastructure across non-China data centers, including OS lifecycle management and fleet-wide health monitoring.
  • Own the reliability and availability of core foundational services such as DNS, NTP, DHCP, NAT, and Kerberos.
  • Design and implement deployment architectures ensuring high availability, fault tolerance, and disaster recovery across regions.
  • Develop and enforce SLOs for managed services and lead incident response, root cause analysis, and post-mortem reviews.
  • Collaborate with network, security, and application teams to support global business growth.
  • Identify automation opportunities to reduce toil and increase operational efficiency through tooling and process improvements.

Requirements

  • Bachelor’s degree in Computer Science, Electrical Engineering, Computer Engineering, or a related field.
  • Solid experience in large-scale Linux host management, including OS deployment, configuration management, and patching.
  • Strong hands-on knowledge of core data center services: DNS (BIND/PowerDNS), NTP, DHCP, NAT, and Kerberos.
  • Proficiency with DevOps configuration management tools such as Ansible, Salt, or Puppet.
  • Familiarity with SRE principles, including SLO/SLI definition and error budget management.
  • Understanding of high availability design patterns and disaster recovery strategies.
  • Strong troubleshooting skills across the Linux system stack and network layer.

Preferred Qualifications

  • Experience managing host fleets at scale (thousands of nodes or more) in a production environment.
  • Scripting or development experience in Python, Go, or Bash for automation.
  • Exposure to hybrid or multi-region data center environments.

About the Company

Founded in 2012, ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, CapCut, and Pico, ByteDance makes it easier and more fun for people to connect, consume, and create content globally.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Site Reliability Engineer

ByteDance · Singapore

Sign up to apply