
Posted 21 hours ago
Senior Machine Learning Operations Engineer
SmartsheetSenior Machine Learning Operations Engineer
Requirements
5+ years ML deployment in cloud (AWS, GCP, Azure), 7+ years programming (Python, Scala), 4+ years deep learning/ML frameworks (PyTorch, TensorFlow, HuggingFace), Experience with SageMaker, Glue, Lambda, Docker, Experience with REST APIs, Degree in Computer Science or related field
Skills
MLOpsPythonAWSPyTorchDockergenerative AI
About the role
Responsibilities
- Architect the machine learning production lifecycle by designing infrastructure, automation, and monitoring systems.
- Automate the deployment and retraining of ML models through complete CI/CD/CT (Continuous Training) pipelines.
- Build, fine-tune, or utilize pre-trained LLMs, deep learning models, and traditional machine learning models.
- Implement model versioning, lineage tracking, and auditing to ensure security and ethical compliance.
- Monitor the health and performance of production models to identify and correct model drift and performance degradation.
- Provision and manage scalable cloud infrastructure using Infrastructure as Code (IaC).
- Act as a technical bridge between Data Scientists and Software Engineers to integrate ML solutions across the platform.
Requirements
- 5+ years of experience creating, deploying, and scaling machine learning solutions in cloud environments (AWS, GCP, or Azure).
- 7+ years of programming experience in languages such as Python or Scala.
- 4+ years of experience developing deep learning and traditional ML models using frameworks like PyTorch, TensorFlow, or HuggingFace.
- Hands-on experience with tools such as SageMaker, Glue, Lambda, and Docker.
- Proven experience developing, documenting, and supporting REST APIs.
- Strong applied data science skills, including the ability to recognize data patterns and evaluate ML algorithm performance.
- Degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
Preferred Qualifications
- Proven ability to stay current with Generative AI advancements (e.g., OpenAI, LangChain, Stable Diffusion APIs).
- Experience with Infrastructure as Code (IaC) for managing scalable cloud environments.
About the Company
Smartsheet empowers teams to manage work seamlessly and scale solutions smarter. We are currently uniting human teams with AI agents to automate manual tasks and uncover insights at scale, creating space for people to focus on judgment, creativity, and big thinking.
ScoutJobs Agent
Get matches like this delivered daily
Sign up free — we'll pull jobs that fit your CV from across the web and rank them for you.
Get started — it's freeSenior Machine Learning Operations Engineer
Smartsheet · Bangalore
