Research Scientist - Vision Language Model at Institute of Foundation Models - ScoutJobs - The AI-curated global job board
Skip to content
Institute of Foundation Models
Posted 6 days ago

Research Scientist - Vision Language Model

Institute of Foundation ModelsResearch Scientist - Vision Language Model

Requirements

PhD in Machine Learning, Computer Vision, NLP, or Multimodal AI, Experience with LLMs or VLMs, Strong Python and PyTorch skills, Experience with distributed training systems, Knowledge of multimodal datasets and processing pipelines

Skills

PythonPyTorchMachine LearningComputer Visiontransformers

About the role

Responsibilities

  • Research and develop next-generation Vision Language Models (VLMs) across pre-training, instruction tuning, reasoning, and agentic capabilities.
  • Develop novel architectures and training methodologies for integrating visual understanding, language reasoning, and tool-use.
  • Build and improve large-scale multimodal datasets, synthetic data generation pipelines, and evaluation benchmarks.
  • Investigate multimodal reasoning, agentic behavior, OCR, grounding, and document/chart understanding.
  • Contribute to technical reports, research publications, and open-source software.
  • Mentor junior researchers and collaborate across teams to drive impactful research initiatives.

Requirements

  • PhD or equivalent research experience in Machine Learning, Computer Vision, NLP, or Multimodal AI.
  • Experience working with large language models (LLMs) or vision-language models (VLMs), including pre-training, fine-tuning, or inference.
  • Strong Python and PyTorch development skills for large-scale machine learning research.
  • Experience with distributed training systems and large-scale model optimization.
  • Familiarity with multimodal datasets and data processing pipelines involving images, text, and video.
  • Understanding of modern deep learning architectures, including Transformers and multimodal fusion techniques.

Preferred Qualifications

  • Hands-on experience training or fine-tuning large VLMs or multimodal foundation models at scale.
  • Experience with distributed learning frameworks such as PyTorch Distributed, Megatron, Triton, or CUDA.
  • Research experience in agentic systems, tool use, grounding, or multimodal retrieval.
  • Familiarity with efficient training and inference techniques like FlashAttention, quantization, or tensor parallelism.
  • Strong publication record in leading AI conferences such as NeurIPS, ICML, ICLR, CVPR, or ACL.

About the Company

The Institute of Foundation Models is a dedicated research lab focused on building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

ScoutJobs Agent

Get matches like this delivered daily

Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.

Get started — it's free

Research Scientist - Vision Language Model

Institute of Foundation Models · Sunnyvale

Sign up to apply