
Posted 19 hours ago
ML Engineer - Inference & Model Deployment
HiringCafeML Engineer - Inference & Model Deployment
Perks & benefits
Health InsuranceMedical InsurancePaid LeaveRelocation Allowance
Requirements
deploying deep learning models in production, large-scale model serving, multi-GPU inference, quantization and pruning, SGLang, vLLM, or TensorRT, distributed systems, cloud platforms
Skills
Machine LearningPythonLLMTensorRTGPU
About the role
Responsibilities
- Deploy and integrate researcher-trained model checkpoints into cloud infrastructure and production pipelines
- Profile and benchmark model performance to identify latency, throughput, memory, and compute bottlenecks
- Implement optimization techniques including quantization, pruning, batching, caching, and efficient attention
- Build scalable multi-GPU inference systems for search, ranking, recommendations, and AI agents
- Design reliable model-serving architecture capable of supporting millions of users
- Develop efficient training and fine-tuning workflows using distributed training and parallelism strategies
Requirements
- Experience deploying and optimizing deep learning models in production environments
- Proven track record with large-scale model serving and multi-GPU inference
- Deep understanding of inference optimization (quantization, pruning, compilation, and memory optimization)
- Proficiency with inference frameworks such as SGLang, vLLM, or TensorRT
- Ability to write clean, production-quality code and integrate ML systems into backend infrastructure
- Experience with cloud platforms, distributed systems, and modern ML serving workflows
Benefits
- Competitive base salary of $250k - $310k plus equity
- Generous health, dental, and vision coverage
- Paid parental leave
- Relocation support
About the Company
HiringCafe is building a 100x better job search engine that is fast, comprehensive, and honest. We index millions of jobs to help people find real opportunities without the noise, ads, or dark patterns found on traditional job boards.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeML Engineer - Inference & Model Deployment
HiringCafe · Cupertino
