Senior ML Ops Engineer

Calix

5 - 10 years

Bengaluru

Posted: 24/05/2025

Job Description

Calix is seeking a highly skilled Senior ML Ops Engineer to join our cutting-edge AI/ML team. In this role, you will be responsible for building, scaling, and maintaining the infrastructure that powers our machine learning and generative AI applications. You will work closely with data scientists, ML engineers, and software developers to ensure our ML/AI systems are robust, efficient, and production ready.

Key Responsibilities:

  • Design, implement, and maintain scalable infrastructure for ML and GenAI applications.
  • Deploy, operate, and troubleshoot production ML pipelines and generative AI services.
  • Build and optimize CI/CD pipelines for ML model deployment and serving.
  • Scale compute resources across CPU/GPU/TPU/NPU architectures to meet performance requirements.
  • Implement container orchestration with Kubernetes for ML workloads.
  • Architect and optimize cloud resources on GCP for ML training and inference.
  • Setup and maintain runtime frameworks and job management systems (Airflow, KubeFlow, MLflow).
  • Establish monitoring, logging and alerting for ML systems observability.
  • Collaborate with data scientists and ML engineers to translate models into production systems.
  • Optimize system performance and resource utilization for cost efficiency.
  • Develop and enforce MLOps best practices across the organization.

Qualifications:

  • Bachelor's degree in computer science, Information Technology, or a related field (or equivalent experience). 
  • 5+ years of overall software engineering experience.
  • 3+ years of focused experience in MLOps or similar ML infrastructure roles.
  • Strong experience with Docker container services and Kubernetes orchestration.
  • Demonstrated expertise in cloud infrastructure management, preferably on GCP (AWS or Azure experience also valued).
  • Proficiency with workflow management and ML runtime frameworks such as Airflow, Kubeflow, and MLflow.
  • Strong CI/CD expertise with experience implementing automated testing and deployment pipelines.
  • Experience with scaling distributed compute architectures utilizing various accelerators (CPU/GPU/TPU/NPU).
  • Solid understanding of system performance optimization techniques.
  • Experience implementing comprehensive observability solutions for complex systems.
  • Knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack).
  • Proficient in at least two of the following: Shell Scripting, Python, Go, C/C++
  • Familiarity with ML frameworks such as PyTorch and ML platforms like SageMaker or Vertex AI.
  • Excellent problem-solving skills and ability to work independently
  • Strong communication skills and ability to work effectively in cross-functional teams.

About Company

Calix, Inc. is a cloud and software platform company headquartered in San Jose, California. It specializes in providing cloud-based software, systems, and services that enable broadband service providers to simplify operations, deliver exceptional subscriber experiences, and grow their businesses. Calix’s solutions focus on empowering communication service providers to optimize their networks, leverage advanced analytics, and create personalized customer experiences. Known for its innovation in broadband technology, Calix helps its clients transition to next-generation networks, ensuring scalability, efficiency, and improved customer satisfaction.

Services you might be interested in

One-Shot Campaign

Reach out to ideal employees in one shot!

The intelligent campaign for reaching out to the ideal audience to whom you can ask for help (guidance or referral).