Datacenter Hardware Operations Technician Lead, Industrial Compute at OpenAI - ScoutJobs - The AI-curated global job board
Skip to content
OpenAI
Posted a day ago

Datacenter Hardware Operations Technician Lead, Industrial Compute

OpenAI

Requirements

8+ years datacenter hardware experience, Expertise in server platforms and GPU systems, Experience with root cause analysis, Hardware reliability engineering knowledge, Ability to work onsite in Abilene, Texas 5 days per week

Skills

GPU

About the role

Responsibilities

  • Serve as the senior on-site hardware operations lead for server, GPU, storage, and rack-level infrastructure.
  • Drive technical triage and resolution of complex hardware failures impacting production systems.
  • Lead root cause analysis (RCA) efforts for critical hardware incidents and develop corrective action plans.
  • Partner with Fleet Health Engineering to investigate recurring hardware issues and improve fleet reliability.
  • Collaborate with Oracle operations teams and OEM vendors to coordinate repairs, upgrades, and lifecycle activities.
  • Establish and improve hardware maintenance procedures, operational runbooks, and troubleshooting standards.
  • Mentor technicians and partner teams on advanced troubleshooting methodologies and operational excellence.

Requirements

  • 8+ years of experience supporting large-scale datacenter hardware infrastructure.
  • Deep expertise with server platforms, GPU systems, storage infrastructure, and rack integration.
  • Strong experience diagnosing complex hardware failures and leading repair efforts in production environments.
  • Proven experience conducting root cause analysis and driving long-term corrective actions.
  • Strong understanding of hardware reliability engineering principles and fleet-health management.
  • Ability to work onsite in Abilene, Texas 5 days per week.

Preferred Qualifications

  • Experience supporting large-scale GPU clusters or AI/ML infrastructure environments.
  • Familiarity with fleet health systems, telemetry platforms, and hardware monitoring tools.
  • Experience with failure analysis methodologies such as FRACAS, RCCA, 5-Why, or FMEA.
  • Knowledge of Linux system administration and hardware validation workflows.
  • Experience supporting hyperscale datacenter operations or HPC environments.

Benefits

  • Medical, dental, and vision insurance with employer HSA contributions.
  • 401(k) retirement plan with employer match.
  • Paid parental leave and paid medical/caregiver leave.
  • Flexible PTO for exempt employees and paid company holidays.
  • Mental health and wellness support.
  • Annual learning and development stipend.
  • Daily meals in offices or meal delivery credits.
  • Relocation support for eligible employees.

About the Company

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Datacenter Hardware Operations Technician Lead, Industrial Compute

OpenAI · Abilene

Sign up to apply