MLOps Engineer
TRDFIN Support Services Pvt Ltd
2 - 5 years
Bengaluru
Posted: 12/01/2026
Job Description
bout the Role
We are looking for an experienced MLOps Engineer to build, automate, and maintain end-to-end machine learning pipelines and production environments. The ideal candidate has strong experience with ML model deployment, workflow orchestration, CI/CD automation, cloud platforms, and scalable architecture for real-time or batch ML systems.
You will work closely with data scientists, ML engineers, and DevOps teams to ensure models are efficiently deployed, monitored, optimized, and continuously improved.
Key Responsibilities
1. ML Pipeline Development & Automation
- Build and manage scalable ML pipelines for data preparation, training, validation, and deployment.
- Create automated workflows using tools like Kubeflow, MLflow, Airflow, Vertex AI Pipelines, or SageMaker Pipelines .
- Implement versioning of datasets, models, and experiments.
2. Model Deployment & Serving
- Deploy ML models on cloud environments (AWS/GCP/Azure) or on-prem.
- Implement real-time model serving using Docker, Kubernetes, KServe, TorchServe, TensorFlow Serving, or FastAPI .
- Develop APIs for inference and integrate models into production systems.
3. CI/CD for ML (Continuous Integration & Delivery)
- Build automated CI/CD pipelines for model training, packaging, and deployment.
- Ensure safe rollouts with canary deployments, A/B tests, and rollback strategies.
- Maintain Git-based workflows for code, model, and pipeline updates.
4. Monitoring, Observability & Maintenance
- Implement end-to-end monitoring for model performance, drift detection, data quality, and metrics .
- Set up logging and alerting using Prometheus, Grafana, ELK/EFK, CloudWatch .
- Automate model retraining triggers based on performance thresholds or data drift.
5. Infrastructure Management
- Build and maintain cloud-based ML infrastructure (compute, storage, networking).
- Work with IaC tools like Terraform, CloudFormation, or Pulumi .
- Optimize resource usage, GPU allocation, and cost efficiency.
6. Collaboration & Documentation
- Work closely with data scientists to productionize notebooks and prototype models.
- Convert experimental code into scalable, maintainable components.
- Document workflows, architecture, pipeline steps, and best practices.
7. ML Governance, Versioning & Security
- Implement model registries (MLflow, SageMaker Model Registry, Vertex AI Model Registry).
- Ensure compliance with security, PII handling, privacy, and governance policies.
- Manage secrets, credentials, and secure access for ML systems.
Required Skills & Qualifications
Technical Skills
- Strong understanding of ML lifecycle , model deployment, and production ML.
- Proficiency in Python , ML frameworks (PyTorch, TensorFlow, Scikit-learn).
- Hands-on experience with Docker, Kubernetes, Helm charts .
- Experience with MLflow, Kubeflow, Airflow, Jenkins, GitHub Actions, or Azure DevOps .
- Cloud experience with AWS (SageMaker, ECS/EKS), GCP (Vertex AI), or Azure ML.
- Knowledge of monitoring tools , APIs , REST/GraphQL , and microservices.
- Familiarity with feature stores (Feast, Tecton) is a plus.
Soft Skills
- Strong problem-solving and analytical mindset.
- Excellent collaboration with DS/DE/DevOps teams.
- Clear communication and documentation abilities.
- Ability to work independently and handle fast-paced environments.
Preferred Qualifications
- Experience with GPU-based training and model optimization.
- Exposure to data engineering tools (Spark, Kafka, Databricks).
- Familiarity with distributed training frameworks (Horovod, DeepSpeed).
- Prior experience deploying LLM or deep learning models.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
