Requirements

PhD in Machine Learning, Computer Vision, NLP, or Multimodal AI, Experience with LLMs or VLMs, Strong Python and PyTorch skills, Experience with distributed training systems, Knowledge of multimodal datasets and processing pipelines

Skills

PythonPyTorchMachine LearningComputer Visiontransformers

About the role

Responsibilities

Research and develop next-generation Vision Language Models (VLMs) across pre-training, instruction tuning, reasoning, and agentic capabilities.
Develop novel architectures and training methodologies for integrating visual understanding, language reasoning, and tool-use.
Build and improve large-scale multimodal datasets, synthetic data generation pipelines, and evaluation benchmarks.
Investigate multimodal reasoning, agentic behavior, OCR, grounding, and document/chart understanding.
Contribute to technical reports, research publications, and open-source software.
Mentor junior researchers and collaborate across teams to drive impactful research initiatives.

Requirements

PhD or equivalent research experience in Machine Learning, Computer Vision, NLP, or Multimodal AI.
Experience working with large language models (LLMs) or vision-language models (VLMs), including pre-training, fine-tuning, or inference.
Strong Python and PyTorch development skills for large-scale machine learning research.
Experience with distributed training systems and large-scale model optimization.
Familiarity with multimodal datasets and data processing pipelines involving images, text, and video.
Understanding of modern deep learning architectures, including Transformers and multimodal fusion techniques.

Preferred Qualifications

Hands-on experience training or fine-tuning large VLMs or multimodal foundation models at scale.
Experience with distributed learning frameworks such as PyTorch Distributed, Megatron, Triton, or CUDA.
Research experience in agentic systems, tool use, grounding, or multimodal retrieval.
Familiarity with efficient training and inference techniques like FlashAttention, quantization, or tensor parallelism.
Strong publication record in leading AI conferences such as NeurIPS, ICML, ICLR, CVPR, or ACL.

About the Company

The Institute of Foundation Models is a dedicated research lab focused on building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

Research Scientist - Vision Language Model

Requirements

Skills

About the role

Responsibilities

Requirements

Preferred Qualifications

About the Company

Get matches like this delivered daily

Research Scientist - Vision Language Model