Perks & benefits

Health InsuranceMedical InsurancePaid LeaveRelocation Allowance

Requirements

deploying deep learning models in production, large-scale model serving, multi-GPU inference, quantization and pruning, SGLang, vLLM, or TensorRT, distributed systems, cloud platforms

Skills

Machine LearningPythonLLMTensorRTGPU

About the role

Responsibilities

Deploy and integrate researcher-trained model checkpoints into cloud infrastructure and production pipelines
Profile and benchmark model performance to identify latency, throughput, memory, and compute bottlenecks
Implement optimization techniques including quantization, pruning, batching, caching, and efficient attention
Build scalable multi-GPU inference systems for search, ranking, recommendations, and AI agents
Design reliable model-serving architecture capable of supporting millions of users
Develop efficient training and fine-tuning workflows using distributed training and parallelism strategies

Requirements

Experience deploying and optimizing deep learning models in production environments
Proven track record with large-scale model serving and multi-GPU inference
Deep understanding of inference optimization (quantization, pruning, compilation, and memory optimization)
Proficiency with inference frameworks such as SGLang, vLLM, or TensorRT
Ability to write clean, production-quality code and integrate ML systems into backend infrastructure
Experience with cloud platforms, distributed systems, and modern ML serving workflows

Benefits

Competitive base salary of $250k - $310k plus equity
Generous health, dental, and vision coverage
Paid parental leave
Relocation support

About the Company

HiringCafe is building a 100x better job search engine that is fast, comprehensive, and honest. We index millions of jobs to help people find real opportunities without the noise, ads, or dark patterns found on traditional job boards.

ML Engineer - Inference & Model Deployment

Perks & benefits

Requirements

Skills

About the role

Responsibilities

Requirements

Benefits

About the Company

Get matches like this delivered daily

ML Engineer - Inference & Model Deployment