
Posted 6 days ago
Research Scientist - Vision Language Model
Institute of Foundation ModelsResearch Scientist - Vision Language Model
Requirements
PhD in Machine Learning, Computer Vision, NLP, or Multimodal AI, Experience with LLMs or VLMs, Strong Python and PyTorch skills, Experience with distributed training systems, Knowledge of multimodal datasets and processing pipelines
Skills
PythonPyTorchMachine LearningComputer Visiontransformers
About the role
Responsibilities
- Research and develop next-generation Vision Language Models (VLMs) across pre-training, instruction tuning, reasoning, and agentic capabilities.
- Develop novel architectures and training methodologies for integrating visual understanding, language reasoning, and tool-use.
- Build and improve large-scale multimodal datasets, synthetic data generation pipelines, and evaluation benchmarks.
- Investigate multimodal reasoning, agentic behavior, OCR, grounding, and document/chart understanding.
- Contribute to technical reports, research publications, and open-source software.
- Mentor junior researchers and collaborate across teams to drive impactful research initiatives.
Requirements
- PhD or equivalent research experience in Machine Learning, Computer Vision, NLP, or Multimodal AI.
- Experience working with large language models (LLMs) or vision-language models (VLMs), including pre-training, fine-tuning, or inference.
- Strong Python and PyTorch development skills for large-scale machine learning research.
- Experience with distributed training systems and large-scale model optimization.
- Familiarity with multimodal datasets and data processing pipelines involving images, text, and video.
- Understanding of modern deep learning architectures, including Transformers and multimodal fusion techniques.
Preferred Qualifications
- Hands-on experience training or fine-tuning large VLMs or multimodal foundation models at scale.
- Experience with distributed learning frameworks such as PyTorch Distributed, Megatron, Triton, or CUDA.
- Research experience in agentic systems, tool use, grounding, or multimodal retrieval.
- Familiarity with efficient training and inference techniques like FlashAttention, quantization, or tensor parallelism.
- Strong publication record in leading AI conferences such as NeurIPS, ICML, ICLR, CVPR, or ACL.
About the Company
The Institute of Foundation Models is a dedicated research lab focused on building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeResearch Scientist - Vision Language Model
Institute of Foundation Models · Sunnyvale
