Senior AI Infrastructure & Platform Engineer at DeepSource Technologies - ScoutJobs - The AI-curated global job board
Skip to content
DeepSource Technologies
Posted 11 hours ago

Senior AI Infrastructure & Platform Engineer

DeepSource TechnologiesSenior AI Infrastructure & Platform Engineer

Requirements

GPU-based AI/ML infrastructure management, Nvidia Base Command Manager, Nvidia AI Enterprise Suite, Slurm or Kubernetes orchestration, Linux system administration (Ubuntu), Bash or Python scripting

Skills

NVIDIASlurmKubernetesPythonLinuxBash

About the role

Responsibilities

  • Deploy, maintain, and optimize GPU-based compute clusters and infrastructure
  • Manage and operate GPU orchestration tools including Nvidia Base Command Manager and Nvidia AI Enterprise Suite
  • Configure and maintain compute workloads using Slurm or Kubernetes orchestration
  • Install and maintain underlying operating systems, specifically Canonical Ubuntu
  • Monitor and troubleshoot infrastructure performance to ensure high uptime for AI/ML workloads
  • Collaborate with data scientists and ML engineers to define resource allocation and deployment workflows
  • Develop automation scripts and CI/CD pipelines for infrastructure provisioning

Requirements

  • Proven experience managing GPU-based AI/ML infrastructure and compute clusters
  • Hands-on experience with Nvidia Base Command Manager and Nvidia AI Enterprise Suite
  • Strong experience with Slurm and/or Kubernetes orchestration
  • Solid Linux system administration skills, preferably on Ubuntu
  • Strong scripting and automation ability using Bash or Python
  • Excellent troubleshooting and performance-tuning skills
  • Strong understanding of networking, security, and cluster management best practices

Preferred Qualifications

  • Previous experience in a high-performance computing (HPC) or AI-focused infrastructure team
  • Knowledge of containerization and GPUs in cloud or on-prem environments
  • Experience with Infrastructure-as-Code tools such as Terraform or Ansible
  • Familiarity with workload scheduling, job queuing, and GPU-shared environments

About the Company

DeepSource Technologies provides advanced technological solutions and infrastructure support to drive innovation in the AI and machine learning sectors.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Senior AI Infrastructure & Platform Engineer

DeepSource Technologies · Riyadh

Sign up to apply