
Posted 11 hours ago
Senior AI Infrastructure & Platform Engineer
DeepSource TechnologiesSenior AI Infrastructure & Platform Engineer
Requirements
GPU-based AI/ML infrastructure management, Nvidia Base Command Manager, Nvidia AI Enterprise Suite, Slurm or Kubernetes orchestration, Linux system administration (Ubuntu), Bash or Python scripting
Skills
NVIDIASlurmKubernetesPythonLinuxBash
About the role
Responsibilities
- Deploy, maintain, and optimize GPU-based compute clusters and infrastructure
- Manage and operate GPU orchestration tools including Nvidia Base Command Manager and Nvidia AI Enterprise Suite
- Configure and maintain compute workloads using Slurm or Kubernetes orchestration
- Install and maintain underlying operating systems, specifically Canonical Ubuntu
- Monitor and troubleshoot infrastructure performance to ensure high uptime for AI/ML workloads
- Collaborate with data scientists and ML engineers to define resource allocation and deployment workflows
- Develop automation scripts and CI/CD pipelines for infrastructure provisioning
Requirements
- Proven experience managing GPU-based AI/ML infrastructure and compute clusters
- Hands-on experience with Nvidia Base Command Manager and Nvidia AI Enterprise Suite
- Strong experience with Slurm and/or Kubernetes orchestration
- Solid Linux system administration skills, preferably on Ubuntu
- Strong scripting and automation ability using Bash or Python
- Excellent troubleshooting and performance-tuning skills
- Strong understanding of networking, security, and cluster management best practices
Preferred Qualifications
- Previous experience in a high-performance computing (HPC) or AI-focused infrastructure team
- Knowledge of containerization and GPUs in cloud or on-prem environments
- Experience with Infrastructure-as-Code tools such as Terraform or Ansible
- Familiarity with workload scheduling, job queuing, and GPU-shared environments
About the Company
DeepSource Technologies provides advanced technological solutions and infrastructure support to drive innovation in the AI and machine learning sectors.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSenior AI Infrastructure & Platform Engineer
DeepSource Technologies · Riyadh
