Requirements

Bachelor's or Master's in CS, ML, or Statistics, 10+ years industry experience, 10+ years ML engineering and MLOps experience, 8+ years parallelism strategies (FSDP, DeepSpeed), 10+ years Python proficiency, 8+ years cloud ML platforms and Kubernetes, 5+ years MLflow, W&B, or Neptune

Skills

MLOpsPythonKubernetesPyTorchAWSTerraformDocker

About the role

Responsibilities

Architect, build, and maintain scalable ML pipelines for training, evaluation, and deployment across cloud and on-prem HPC environments
Build MLOps infrastructure including experiment tracking, model registry, feature stores, and automated retraining workflows
Implement CI/CD/CT pipelines for ML models using tools such as Kubeflow, MLflow, or Airflow
Containerize ML workloads with Docker and orchestrate at scale using Kubernetes and GPU node pools
Develop, fine-tune, and deploy large-scale models including LLMs, GNNs, and reinforcement learning agents
Manage cloud ML infrastructure on AWS, Azure, or GCP with cost and performance optimization
Automate infrastructure provisioning using Terraform or CloudFormation for GPU-backed environments
Build monitoring, alerting, and observability systems for model performance drift and system health

Requirements

Bachelor's or Master's degree in Computer Science, Machine Learning, Statistics, or a related field
10+ years of industry experience, including 10+ years in ML engineering and MLOps
8+ years of experience with parallelism strategies such as FSDP, DeepSpeed, or data/model parallelism
10+ years of proficiency in Python programming
8+ years of experience with cloud ML platforms (AWS, GCP, Azure), Docker, Kubernetes, and CI/CD pipelines
5+ years of hands-on experience with MLflow, W&B, or Neptune for tracking and reproducibility

Preferred Qualifications

PhD in Computer Science, Machine Learning, or Statistics
Experience applying ML/AI to semiconductor, EDA, or chip design domains
Familiarity with HPC schedulers such as LSF or Slurm and GPU cluster management
Knowledge of LLM fine-tuning, RAG architectures, and AI agent frameworks like LangChain
Experience with graph neural networks (GNNs) or geometric deep learning
Background in reinforcement learning for optimization problems

About the Company

Altera is the world’s largest pure-play FPGA solutions provider, delivering programmable technologies that help customers innovate across AI, cloud, networking, and edge markets. As an independent company, Altera focuses on providing leadership programmable solutions that are easy to use and deploy, helping to shape the future of the semiconductor industry.

Senior MLOps & AI Infrastructure Engineer

Requirements

Skills

About the role

Responsibilities

Requirements

Preferred Qualifications

About the Company

Get matches like this delivered daily

Senior MLOps & AI Infrastructure Engineer