
Posted 16 hours ago
GPU/ML Systems Engineer
Aivar Innovations Private LimitedGPU/ML Systems Engineer
Requirements
3-7 years experience, Hands-on GPU optimization, vLLM or Triton Inference Server, Model quantization (INT8, FP16, GPTQ, AWQ), CUDA ecosystem, AWS GPU instances, Performance profiling
Skills
GPUMachine LearningCUDA
About the role
Responsibilities
- Deploy and tune vLLM with multi-GPU tensor parallelism, dynamic batching, and KV cache optimization for LLMs
- Configure NVIDIA Triton for production multi-model serving with custom backends and model ensembles
- Build TensorRT-LLM optimized model binaries for maximum throughput on L40S, A100, and H100 GPUs
- Implement AWS Inferentia deployments using Neuron SDK, including model compilation and performance tuning
- Execute model quantization (INT8, FP16, GPTQ, AWQ) with rigorous quality-accuracy tradeoff analysis
- Run comprehensive load testing using Locust to map performance cliffs and scaling thresholds
- Produce detailed benchmark reports with instance selection and cost-per-token recommendations
Requirements
- 3–7 years of experience with GPU-accelerated ML workloads in production
- Hands-on experience with LLM serving frameworks such as vLLM, TensorRT-LLM, or Triton Inference Server
- Deep understanding of GPU architecture, including memory hierarchy, tensor cores, NVLink, and NCCL
- Proficiency in model quantization techniques (INT8, FP16, GPTQ, AWQ)
- Strong knowledge of the CUDA ecosystem (drivers, cuDNN, NVIDIA container toolkit)
- Experience with performance profiling tools like Nsight, nvidia-smi, or DCGM
- Practical experience managing AWS GPU instances (G-series, P-series)
Preferred Qualifications
- Experience optimizing models for custom accelerators like AWS Inferentia or Trainium
- Familiarity with KServe and Prometheus + DCGM Exporter for monitoring
Benefits
- Learn from experts, including former AWS leaders and AI pioneers
- Direct ownership of high-impact "greenfield" projects from concept to launch
- Access to modern tech stacks, including the latest Generative AI frameworks
- Opportunity for rapid career growth in a high-speed environment
About the Company
Aivar Innovations is an AI-first technology partner where cutting-edge technology meets industry expertise. We provide AI-augmented teams that accelerate development, reduce time-to-market, and deliver exceptional code quality for major global enterprises.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeGPU/ML Systems Engineer
Aivar Innovations Private Limited · Bangalore
