Institute of Foundation Models

Data Engineer

Institute of Foundation Models

Data Engineer

Posted 2 months ago

Employment Type

Full Time

Location

Abu Dhabi

Experience

Mid Level, Senior

Benefits

Annual LeaveHealth InsuranceVisaRelocation Allowance

Requirements

Python, Data pipelines, Web crawling, SQL, Cloud platforms

Job Description

Responsibilities

  • Rapidly collect, curate, and preprocess datasets based on detailed specifications provided by NLP researchers, delivering data within tight timelines (typically within 1-2 days).
  • Develop and maintain efficient web crawling solutions, APIs, and automated workflows to continuously improve data collection processes.
  • Refine and evaluate outputs from Large Language Models (LLMs) to generate structured datasets suitable for model training and benchmarking.
  • Implement scalable data pipelines, ensuring efficient data processing, storage, retrieval, and distribution to research teams.
  • Collaborate closely with researchers and engineers to ensure collected data meets specified quality and relevance criteria.
  • Document data collection methodologies, dataset characteristics, and pipeline architecture clearly and effectively.
  • Engage with peer teams and participate in technical reviews to uphold best practices and data quality standards.
  • Represent MBZUAI at industry and research forums, showcasing technical capabilities in large-scale data processing and AI data infrastructure.
  • Perform all other duties as reasonably directed by the line manager commensurate with these functional objectives.

Requirements

  • Bachelor’s degree in Computer Science, Data Science, Engineering, or a related technical field
  • Extensive experience in data engineering, data processing, and automation using Python
  • Proficiency in designing and deploying web crawling solutions, automated data extraction, and processing pipelines
  • Strong understanding of data structures, algorithms, databases, SQL, and performance optimization
  • Experience working with cloud infrastructure and distributed data processing frameworks (e.g., AWS, Spark, Kafka, Kubernetes)
  • Excellent problem-solving abilities and attention to detail
  • Strong communication and collaboration skills

Preferred Qualifications

  • Master’s degree or equivalent experience in Computer Science, Data Engineering, or related technical fields
  • Proven track record supporting NLP or AI research teams with rapid data delivery
  • Experience refining outputs from large-scale AI models, such as LLM-generated data
  • Contributions to open-source projects or visible activity in coding communities (e.g., GitHub, Stack Overflow)
  • Familiarity with advancements in NLP data processing and large language model technologies

Benefits

  • Health Insurance
  • Annual Leave
  • Visa
  • Relocation Allowance

About the Company

The Institute of Foundation Models is a dedicated research lab committed to building, understanding, using, and risk-managing foundation models. Our mission is to advance research, nurture future AI builders, and contribute transformative innovations for a knowledge-driven economy. As part of our Abu Dhabi-based team, you will collaborate with world-class researchers, data scientists, and engineers, developing AI solutions with the power to shape whole industries. The institute strives to inspire the next generation of AI pioneers and establish itself as a global leader in high-performance deep learning research.

How to Apply

Similar Jobs You Might Be Interested In