
Posted 4 days ago
Senior Site Reliability Engineer
Onebrief
Requirements
Active Top Secret clearance, 5+ years in Platform, DevOps, or SRE, Terraform or CloudFormation, Ansible, Kubernetes, GitLab CI/CD, Jenkins, or GitHub Actions, Python, Go, or Bash, AWS or AWS GovCloud, Grafana, ELK, or Datadog
Skills
KubernetesTerraformAWSPythonAnsibleGo
About the role
Responsibilities
- Own the reliability, scalability, and security of the production application and platform across AWS and on-premise DoD environments.
- Design, implement, and manage a world-class observability platform using tools like Prometheus, Loki, and Grafana.
- Define and measure Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to increase system trust.
- Lead incident response and act as incident commander during critical events, conducting blameless post-mortems and After Action Reviews (AARs).
- Automate infrastructure using Terraform and Ansible, embedding security and compliance controls (RMF, STIGs) directly into the automation.
- Proactively identify and eliminate operational toil through advanced automation and improved runbooks.
Requirements
- Active Top Secret clearance (with the ability to obtain SCI eligibility).
- 5+ years of experience in Platform, DevOps, or Site Reliability Engineering.
- Proficiency with Infrastructure as Code tools such as Terraform or CloudFormation and Ansible.
- Hands-on experience with Kubernetes design, deployment, and operations.
- Experience building and maintaining CI/CD pipelines (GitLab CI/CD, Jenkins, or GitHub Actions).
- Proficiency in at least one scripting language: Python, Go, or Bash.
- Familiarity with AWS or AWS GovCloud.
- Experience with observability stacks such as Grafana, ELK, or Datadog.
- Strong understanding of networking fundamentals and secure configurations.
Preferred Qualifications
- Experience working in DoD environments and familiarity with compliance frameworks (RMF, STIGs, ICD 503).
- Experience with GitOps practices and service mesh technologies like Istio or Linkerd.
- Familiarity with on-prem virtualization (VMware, Proxmox, Nutanix, or Hyper-V).
- Relevant certifications such as AWS DevOps Engineer, CKA/CKAD, or Security+.
Benefits
- Compensation: $180K – $220K plus equity.
- Comprehensive health, dental, vision, and life insurance.
- 401(k) plan with company match.
- Unlimited PTO and 8 weeks of fully paid parental leave.
- Annual company summit retreats and a $1,000 annual home office budget.
About the Company
Onebrief is a collaboration and AI-powered workflow software company designed specifically for military staffs. We transform military planning to make staffs faster, smarter, and more efficient. Founded in 2019, Onebrief is a high-growth organization valued at $2.15B, backed by top-tier investors including Battery Ventures and General Catalyst.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSenior Site Reliability Engineer
Onebrief · Arlington
