Requirements

PhD in Computer Science or related field, Understanding of transformer-based model internals, Experience with distributed systems and memory management, Proficiency in C++, Rust, Go, or CUDA, Familiarity with NVIDIA GPUs and TensorRT

Skills

C#CUDADistributed SystemsLLMRust

About the role

Responsibilities

Design and implement a distributed KV cache system to store and retrieve intermediate states for transformer-based LLMs across GPUs or nodes.
Optimize low-latency access and eviction policies for caching long-context LLM inputs, token streams, and reused embeddings.
Collaborate with inference and serving teams to integrate the cache with token streaming pipelines, batched decoding, and model parallelism.
Develop cache consistency and synchronization protocols for multi-tenant, multi-request environments.
Implement memory-aware sharding, eviction strategies, and replication across GPUs or distributed memory backends.
Monitor system performance and iterate on caching algorithms to reduce compute costs and response time.
Evaluate and extend open-source KV stores or build custom GPU-aware caching layers using CUDA, Triton, or RDMA.

Requirements

PhD in Computer Science, Applied Mathematics, Electrical Engineering, or a related technical field.
Strong understanding of transformer-based model internals and how KV caching affects autoregressive decoding.
Experience with distributed systems, memory management, and low-latency serving (RPC, gRPC, CUDA-aware networking).
Familiarity with high-performance compute environments including NVIDIA GPUs, TensorRT, and Triton Inference Server.
Proficiency in systems-level development using C++, Rust, Go, or CUDA.

Preferred Qualifications

Prior experience building inference-serving systems for LLMs (e.g., vLLM, SGLang, FasterTransformer, DeepSpeed, or Hugging Face TGI).
Experience with memory hierarchy optimization (HBM, NUMA, NVLink) and GPU-to-GPU communication (NCCL, GDR, GDS, InfiniBand).
Exposure to cache-aware scheduling, batching, and prefetching strategies in model serving.

Benefits

Competitive base salary and eligibility for discretionary bonuses and restricted stock units.
Comprehensive medical, dental, and vision insurance.
401(k) savings plan with company match.
Paid parental leave and wellbeing benefits.
Generous paid time off, including holidays and sick days.

About the Company

ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, and CapCut, ByteDance makes it easier and more fun for people to connect, consume, and create content globally.

Senior Research Engineer/Scientist - Storage for LLM

Requirements

Skills

About the role

Responsibilities

Requirements

Preferred Qualifications

Benefits

About the Company

Get matches like this delivered daily

Senior Research Engineer/Scientist - Storage for LLM