🔔 FCM Loaded

ML Ops Engineer 4 - GCP [T500-20226]

Costco IT

2 - 5 years

Hyderabad

Posted: 23/12/2025

Getting a referral is 5x more effective than applying directly

Job Description

About Costco Wholesale

Costco Wholesale is a multi-billion-dollar global retailer with warehouse club operations in eleven countries. They provide a wide selection of quality merchandise, plus the convenience of specialty departments and exclusive member services, all designed to make shopping a pleasurable experience for their members.

About Costco Wholesale India

At Costco Wholesale India, we foster a collaborative space, working to support Costco Wholesale in developing innovative solutions that improve members experiences and make employees jobs easier. Our employees play a key role in driving and delivering innovation to establish IT as a core competitive advantage for Costco Wholesale.


Position Title: ML Ops Engineer 4

Job Description:

Roles & Responsibilities:

  • Define the long-term vision and strategy for MLOps initiatives: Set the direction for the organizations MLOps, model deployment, and monitoring practices.
  • Lead and manage a team of MLOps engineers: Provide technical guidance, mentorship, and career development for team members.
  • Identify and explore cutting-edge research areas and technologies: Stay abreast of the latest advancements in MLOps, model serving, and AI operations.
  • Drive innovation and the development of novel MLOps solutions: Lead efforts, prototype new approaches, and oversee implementation of advanced MLOps platforms.
  • Design and manage scalable ML infrastructure and pipelines on GCP; oversee model deployment (A/B testing, rollouts/rollbacks, auto-scaling), and establish monitoring/observability (performance, drift, KPIs).
  • Ensure ML operations meet governance, security, compliance, and disaster recovery standards across the organization.
  • Collaborate with executive leadership on strategic decision-making: Align MLOps initiatives with business objectives and organizational priorities.
  • Establish and enforce MLOps standards and best practices: Ensure quality, reproducibility, and security of ML systems across the organization.
  • Represent the organization in external MLOps communities: Speak at conferences, publish thought leadership, and build partnerships with academia and industry.


Technical Skills:

  • 12+ - years of experience
  • Mastery of relevant technical skills: Deep expertise in MLOps, model deployment, monitoring, and governance.
  • Significant experience in designing and implementing complex MLOps systems at scale: Lead the architecture and deployment of large-scale MLOps platforms on GCP.
  • Hands-on experience architecting large-scale ML platforms on GCP (Vertex AI, GKE, Dataflow, Big Query, Pub/Sub, Cloud Composer), implementing experiment tracking (MLflow, Weights & Biases, TensorBoard), feature stores (Vertex AI), data pipelines and workflow orchestration, and ensuring cloud security, compliance, disaster recovery, and cost optimization.
  • Strong leadership and team management skills: Build, mentor, and lead high-performing MLOps teams.
  • Excellent strategic thinking and problem-solving abilities: Translate business challenges into scalable, reliable MLOps solutions.
  • Exceptional communication and influencing skills: Advocate for MLOps initiatives, and influence executive decisions and represent the organization externally through conferences, publications, and industry engagement.


Must Have Skills:

  • Deep expertise in MLOps, model deployment, monitoring, and governance
  • Experience building scalable MLOps platforms on GCP
  • Proficiency with CI/CD for ML, containerization (e.g. Docker, Kubernetes), IaC (Terraform), and orchestration
  • Leadership in MLOps strategy, standards, and cross-team collaboration
  • Hands-on expertise with GCP ML and data services (Vertex AI, Dataflow, BigQuery, Pub/Sub, Cloud Composer, GKE).
  • Experience implementing model observability (performance monitoring, drift detection, dashboards, and alerts).
  • Proficiency with experiment tracking (MLflow, W&B) and feature store management.
  • Knowledge of cloud security, compliance, and cost optimization strategies.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.