Posted 21 hours ago

Senior Systems Software Engineer, AI Stack and Performance - DGX Station

NVIDIA CorporationSenior Systems Software Engineer, AI Stack and Performance - DGX Station

Requirements

BS or MS in Computer Science or related field, 12+ years in systems software engineering, Proficiency in PyTorch, TensorFlow, or JAX, Experience with Nsight Systems or Nsight Compute, Strong understanding of GPU architecture, Proficiency in C/C++, CUDA, and Python

Skills

PyTorchCUDAPythonC#TensorRT

About the role

Responsibilities

Own the production readiness of AI applications on DGX Station, including NemoClaw, Hermes agents, and NIM microservices.
Profile and optimize deep learning workloads (PyTorch, TensorFlow, JAX) across the GB300 Blackwell multi-GPU architecture.
Identify and resolve system-level bottlenecks in GPU compute, NVLink bandwidth, host memory, PCIe, and CPU–GPU communication.
Collaborate with framework, compiler (TensorRT, NVCC, Triton), and GPU architecture teams to improve kernel fusion and memory management.
Validate multi-user and concurrent workload scenarios, ensuring reliable performance in shared workstation environments.
Maintain performance benchmarking infrastructure to track regressions across key models like LLaMA, GPT, and Stable Diffusion.
Validate the full NVIDIA AI software stack, including CUDA toolkit, cuDNN, NCCL, and DOCA/OFED.

Requirements

BS or MS in Computer Science, Electrical Engineering, or a related field.
12+ years of experience in systems software engineering with a focus on AI/ML workload optimization or GPU performance.
Strong proficiency in deep learning frameworks such as PyTorch, TensorFlow, or JAX, including knowledge of graph execution and memory management.
Hands-on experience profiling GPU workloads using Nsight Systems, Nsight Compute, or CUPTI.
Deep understanding of GPU architecture, including compute units, memory hierarchy, and NVLink scaling.
Proficiency in C/C++, CUDA, and Python, with the ability to read and modify GPU kernels.
Experience with inference optimization techniques such as quantization (INT8/FP8) and model compilation.

Preferred Qualifications

Experience optimizing LLM training or inference on multi-GPU NVIDIA systems (DGX, HGX, or workstations).
Contributions to open-source AI frameworks, CUDA libraries, or inference engines.
Expertise in multi-GPU communication optimization, including NCCL tuning and collective operations.
Proven track record of driving hardware-specific performance improvements through collaboration with compiler teams.

About the Company

NVIDIA is the AI computing company. Our invention of the GPU in 1999 fueled the growth of PC gaming, redefined modern computer graphics, and revolutionized parallel computing. Today, NVIDIA is recognized as the leader in AI, powering the systems that perceive and interpret the world.

ScoutJobs Agent

Get matches like this delivered daily

Get started — it's free

Senior Systems Software Engineer, AI Stack and Performance - DGX Station

NVIDIA Corporation · Santa Clara