ML Engineer - Inference & Model Deployment at HiringCafe - ScoutJobs - The AI-curated global job board
Skip to content
HiringCafe
Posted 19 hours ago

ML Engineer - Inference & Model Deployment

HiringCafeML Engineer - Inference & Model Deployment

Perks & benefits

Health InsuranceMedical InsurancePaid LeaveRelocation Allowance

Requirements

deploying deep learning models in production, large-scale model serving, multi-GPU inference, quantization and pruning, SGLang, vLLM, or TensorRT, distributed systems, cloud platforms

Skills

Machine LearningPythonLLMTensorRTGPU

About the role

Responsibilities

  • Deploy and integrate researcher-trained model checkpoints into cloud infrastructure and production pipelines
  • Profile and benchmark model performance to identify latency, throughput, memory, and compute bottlenecks
  • Implement optimization techniques including quantization, pruning, batching, caching, and efficient attention
  • Build scalable multi-GPU inference systems for search, ranking, recommendations, and AI agents
  • Design reliable model-serving architecture capable of supporting millions of users
  • Develop efficient training and fine-tuning workflows using distributed training and parallelism strategies

Requirements

  • Experience deploying and optimizing deep learning models in production environments
  • Proven track record with large-scale model serving and multi-GPU inference
  • Deep understanding of inference optimization (quantization, pruning, compilation, and memory optimization)
  • Proficiency with inference frameworks such as SGLang, vLLM, or TensorRT
  • Ability to write clean, production-quality code and integrate ML systems into backend infrastructure
  • Experience with cloud platforms, distributed systems, and modern ML serving workflows

Benefits

  • Competitive base salary of $250k - $310k plus equity
  • Generous health, dental, and vision coverage
  • Paid parental leave
  • Relocation support

About the Company

HiringCafe is building a 100x better job search engine that is fast, comprehensive, and honest. We index millions of jobs to help people find real opportunities without the noise, ads, or dark patterns found on traditional job boards.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

ML Engineer - Inference & Model Deployment

HiringCafe · Cupertino

Sign up to apply