
Posted a day ago
Software Engineer, Systems ML
MetaSoftware Engineer, Systems ML
Requirements
Bachelor's degree in Computer Science or related field, 8+ years of experience in systems engineering or ML infrastructure, Proficiency with PyTorch, JAX, or TensorFlow, Low-level systems programming in C++ or CUDA, Experience with distributed ML training or inference at scale
Skills
PyTorchC#CUDAMachine LearningDistributed Systems
About the role
Responsibilities
- Design and implement scalable systems for distributed ML training and inference, optimizing compute, memory, and communication.
- Develop novel techniques to accelerate AI research workflows, including training, inference, and reinforcement learning.
- Lead the architecture and end-to-end delivery of major systems ML initiatives across research and product teams.
- Establish performance benchmarking frameworks and profiling pipelines to improve training throughput and inference latency.
- Define service level objectives and reliability standards for ML training and serving systems.
- Collaborate with cross-functional partners to co-design ML systems that maximize research velocity.
- Mentor engineers on systems ML best practices, distributed training patterns, and debugging methodologies.
Requirements
- Bachelor's degree in Computer Science, Computer Engineering, or a related technical field.
- 8+ years of experience in systems engineering, machine learning infrastructure, or a closely related field.
- Experience designing and optimizing distributed ML training or inference systems at scale.
- Proficiency with machine learning frameworks such as PyTorch, JAX, or TensorFlow.
- Experience with low-level systems programming in C++ or CUDA, including performance profiling or kernel optimization.
- Proven track record of leading the technical design and delivery of complex, cross-functional systems projects.
Preferred Qualifications
- Master's or PhD in Computer Science, Electrical Engineering, or Machine Learning.
- Track record of publishing research on systems ML topics at venues such as MLSys, OSDI, SOSP, NeurIPS, or ICML.
- Experience with ML compiler stacks such as MLIR, XLA, TVM, or Triton.
- Experience with model parallelism strategies including tensor, pipeline, and expert parallelism.
- Experience with hardware-software co-design for AI accelerators.
About the Company
Meta builds technologies that help people connect, find communities, and grow businesses. From Messenger and Instagram to the next evolution of social technology in augmented and virtual reality, Meta is moving beyond 2D screens to help build the future of digital connection.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSoftware Engineer, Systems ML
Meta · Bellevue
