A
Posted 21 days ago
Senior MLOps & AI Infrastructure Engineer
AlteraSenior MLOps & AI Infrastructure Engineer
Requirements
Bachelor's or Master's in CS, ML, or Statistics, 10+ years industry experience, 10+ years ML engineering and MLOps experience, 8+ years parallelism strategies (FSDP, DeepSpeed), 10+ years Python proficiency, 8+ years cloud ML platforms and Kubernetes, 5+ years MLflow, W&B, or Neptune
Skills
MLOpsPythonKubernetesPyTorchAWSTerraformDocker
About the role
Responsibilities
- Architect, build, and maintain scalable ML pipelines for training, evaluation, and deployment across cloud and on-prem HPC environments
- Build MLOps infrastructure including experiment tracking, model registry, feature stores, and automated retraining workflows
- Implement CI/CD/CT pipelines for ML models using tools such as Kubeflow, MLflow, or Airflow
- Containerize ML workloads with Docker and orchestrate at scale using Kubernetes and GPU node pools
- Develop, fine-tune, and deploy large-scale models including LLMs, GNNs, and reinforcement learning agents
- Manage cloud ML infrastructure on AWS, Azure, or GCP with cost and performance optimization
- Automate infrastructure provisioning using Terraform or CloudFormation for GPU-backed environments
- Build monitoring, alerting, and observability systems for model performance drift and system health
Requirements
- Bachelor's or Master's degree in Computer Science, Machine Learning, Statistics, or a related field
- 10+ years of industry experience, including 10+ years in ML engineering and MLOps
- 8+ years of experience with parallelism strategies such as FSDP, DeepSpeed, or data/model parallelism
- 10+ years of proficiency in Python programming
- 8+ years of experience with cloud ML platforms (AWS, GCP, Azure), Docker, Kubernetes, and CI/CD pipelines
- 5+ years of hands-on experience with MLflow, W&B, or Neptune for tracking and reproducibility
Preferred Qualifications
- PhD in Computer Science, Machine Learning, or Statistics
- Experience applying ML/AI to semiconductor, EDA, or chip design domains
- Familiarity with HPC schedulers such as LSF or Slurm and GPU cluster management
- Knowledge of LLM fine-tuning, RAG architectures, and AI agent frameworks like LangChain
- Experience with graph neural networks (GNNs) or geometric deep learning
- Background in reinforcement learning for optimization problems
About the Company
Altera is the world’s largest pure-play FPGA solutions provider, delivering programmable technologies that help customers innovate across AI, cloud, networking, and edge markets. As an independent company, Altera focuses on providing leadership programmable solutions that are easy to use and deploy, helping to shape the future of the semiconductor industry.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSenior MLOps & AI Infrastructure Engineer
Altera · San Jose
