Requirements

Experience training large-scale video models, Deep learning architectures for vision/multimodal systems, Large-scale pretraining and scaling laws, Proficiency in Python and PyTorch, Distributed training and GPU clusters, Scalable software engineering skills

Skills

PyTorchPythonComputer VisionDeep Learning

About the role

Responsibilities

Design and train large-scale video foundation models using internet-scale and robot-collected data
Develop pretraining strategies to capture temporal dynamics, motion, and object interaction
Build models that learn transferable representations for perception, tracking, prediction, and control
Explore transformer-based and diffusion-based architectures for video understanding and generation
Implement efficient data pipelines and training strategies for high-throughput distributed training
Optimize model performance across compute, memory, and training efficiency constraints
Collaborate with generative modeling and robot learning teams to integrate models into the autonomy stack
Design evaluation frameworks to measure temporal understanding and generalization

Requirements

Experience training large-scale models on video data or high-dimensional sequential modalities
Strong understanding of modern deep learning architectures for vision or multimodal systems
Experience with large-scale pretraining, dataset curation, and scaling laws
Proficiency in Python and deep learning frameworks such as PyTorch
Experience working with distributed training systems and large GPU clusters
Strong experimental rigor and ability to iterate quickly on model design
Solid software engineering skills to build scalable, reliable systems
Ability to operate independently and drive high-impact research directions

Preferred Qualifications

Experience working on frontier video models or multimodal foundation models
Background in video diffusion, autoregressive video modeling, or world models
Experience at leading AI labs (e.g., OpenAI, Google DeepMind, ByteDance)
Experience with large-scale dataset construction and filtering
Familiarity with robotics, embodied AI, or learning from egocentric video
Publication record in machine learning, computer vision, or multimodal AI

About the Company

Figure is an AI robotics company developing autonomous general-purpose humanoid robots. Our goal is to build embodied AI systems that can perceive, reason, and act in the real world.

Helix AI Engineer, Video Pretraining

Requirements

Skills

About the role

Responsibilities

Requirements

Preferred Qualifications

About the Company

Get matches like this delivered daily

Helix AI Engineer, Video Pretraining