Senior Research Engineer/Scientist - Storage for LLM at ByteDance - ScoutJobs - The AI-curated global job board
Skip to content
ByteDance
Posted 15 hours ago

Senior Research Engineer/Scientist - Storage for LLM

ByteDanceSenior Research Engineer/Scientist - Storage for LLM

Requirements

PhD in Computer Science or related field, Understanding of transformer-based model internals, Experience with distributed systems and memory management, Proficiency in C++, Rust, Go, or CUDA, Familiarity with NVIDIA GPUs and TensorRT

Skills

C#CUDADistributed SystemsLLMRust

About the role

Responsibilities

  • Design and implement a distributed KV cache system to store and retrieve intermediate states for transformer-based LLMs across GPUs or nodes.
  • Optimize low-latency access and eviction policies for caching long-context LLM inputs, token streams, and reused embeddings.
  • Collaborate with inference and serving teams to integrate the cache with token streaming pipelines, batched decoding, and model parallelism.
  • Develop cache consistency and synchronization protocols for multi-tenant, multi-request environments.
  • Implement memory-aware sharding, eviction strategies, and replication across GPUs or distributed memory backends.
  • Monitor system performance and iterate on caching algorithms to reduce compute costs and response time.
  • Evaluate and extend open-source KV stores or build custom GPU-aware caching layers using CUDA, Triton, or RDMA.

Requirements

  • PhD in Computer Science, Applied Mathematics, Electrical Engineering, or a related technical field.
  • Strong understanding of transformer-based model internals and how KV caching affects autoregressive decoding.
  • Experience with distributed systems, memory management, and low-latency serving (RPC, gRPC, CUDA-aware networking).
  • Familiarity with high-performance compute environments including NVIDIA GPUs, TensorRT, and Triton Inference Server.
  • Proficiency in systems-level development using C++, Rust, Go, or CUDA.

Preferred Qualifications

  • Prior experience building inference-serving systems for LLMs (e.g., vLLM, SGLang, FasterTransformer, DeepSpeed, or Hugging Face TGI).
  • Experience with memory hierarchy optimization (HBM, NUMA, NVLink) and GPU-to-GPU communication (NCCL, GDR, GDS, InfiniBand).
  • Exposure to cache-aware scheduling, batching, and prefetching strategies in model serving.

Benefits

  • Competitive base salary and eligibility for discretionary bonuses and restricted stock units.
  • Comprehensive medical, dental, and vision insurance.
  • 401(k) savings plan with company match.
  • Paid parental leave and wellbeing benefits.
  • Generous paid time off, including holidays and sick days.

About the Company

ByteDance's mission is to inspire creativity and enrich life. With a suite of more than a dozen products, including TikTok, Lemon8, and CapCut, ByteDance makes it easier and more fun for people to connect, consume, and create content globally.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Senior Research Engineer/Scientist - Storage for LLM

ByteDance · Seattle

Sign up to apply