🔔 FCM Loaded

Manager - ML Delivery & Data Operations

Mechademy

5 - 10 years

Gurugram

Posted: 19/02/2026

Getting a referral is 5x more effective than applying directly

Job Description

About the job


Mechademy's ML operations are at an inflection point. We've built internal tools that can create production-grade machine learning models in under 30 minutes even for years of industrial sensor data. The delivery engine exists. Now we need someone to run the factory.


We're looking for a Manager ML Delivery & Data Operations with 5-7+ years of experience to own our ML model lifecycle, client data onboarding, and operational excellence. You'll work directly with the Director of Data Science to free him from 20-25 hours/week of operational work while scaling ML model production to 30+ models daily by 2027.\


Our clients include Berkshire Hathaway, Chevron, SM Energy, and Freeport LNG. When we say "zero-defect execution," we mean it mistakes aren't an option when you're monitoring billion-dollar industrial assets.


This role is 50% operations management and 50% hands-on execution initially, shifting to 70% management as the team scales.


Key Responsibilities


Operations Management & Process Excellence (30%)

  • Triage incoming requests (model creation, data onboarding, ad-hoc analyses) and distribute work across the team
  • Establish SLAs for ML operations and data operations define what "good" looks like and hold the team to it
  • Build processes, SOPs, and automation to reduce the current 80% manual operational burden by 40%+
  • Capacity planning: scale from current pace to 30+ models daily by 2027
  • Identify operational bottlenecks and implement systematic solutions
  • Free the Director from operational firefighting, enabling 90% strategic focus


ML Model Lifecycle Management (25%)

  • Use our internal AutoML tools to create regression models for clients (training takes <30 min per model)
  • Validate model quality you need to understand feature engineering, feature selection, evaluation metrics (not just accuracy residuals, drift, business-relevant metrics), and know whether a model is good enough to ship to Chevron
  • Deploy models to production environments and monitor for drift and degradation
  • Manage model retraining schedules and lifecycle
  • Build automation for model monitoring (currently manual scripts)
  • Transition from 80% manual ML ops to automated, scalable processes


Client Data Onboarding & Quality Assurance (25%)

  • Lead client dataset onboarding from raw IoT sensor data to ML-ready state
  • Prepare data for ML model training using our AutoML platform
  • Write and optimize SQL queries to inspect, transform, and validate client data
  • Implement rigorous DQA workflows: type checks, missingness detection, outlier flagging, reconciliation
  • Partner with Customer Success, Product, and Engineering to resolve data blockers
  • Ensure zero defects in client data entering ML pipelines


Team Leadership & Hiring (20%)

  • Directly manage 2-3 people initially, grow team to 6-7 over 12-18 months
  • Conduct weekly 1:1s, performance reviews, career development planning
  • Hire and onboard ML/Data Ops Specialists with Director approval
  • Create SOPs, training materials, and knowledge transfer processes
  • Foster culture of rigor, craftsmanship, and zero-defect execution
Required Qualifications


  • 5-7+ years in data operations, analytics delivery, ML operations, analytics engineering, or similar operational roles
  • 2+ years with direct team management responsibility (not just tech lead)
  • Strong proficiency in Python (Pandas, NumPy, Polars); production-quality code, not just notebooks
  • Write optimized SQL queries for large datasets; query tuning, window functions, CTEs
  • Solid understanding of ML concepts you should know what feature engineering and feature selection are, why models are created, how to evaluate whether a model is performing well, and what deployment means in practice. You don't need to design algorithms, but you need to look at a model's output and know whether it's good enough to ship.
  • Data validation, cleaning, anomaly detection, automated DQ workflows
  • Scripting for process automation, scheduling, orchestration
  • Demonstrated track record of building processes from scratch: SOPs, automation, SLAs, capacity planning
  • Process-driven mindset: you see a manual process and instinctively ask "how do I automate this?"
  • Comfortable starting hands-on and evolving to management as the team scales


Preferred Qualifications


  • Experience with AutoML platforms or ML lifecycle management tools (MLflow, Ray, Kubeflow)
  • Experience with orchestration tools: Airflow, Prefect, or Dagster
  • Track record of reducing manual operational burden by 40%+ through automation
  • Experience scaling operations from low volume to high volume (10x+ growth)
  • Client data onboarding experience working with messy, real-world external data
  • ML frameworks awareness (scikit-learn, XGBoost)
  • Statistical methods for outlier detection
  • Startup or high-growth environment experience


Technologies You'll Work With


  • Languages: Python, SQL
  • ML Operations: AutoML platforms, model deployment, monitoring, drift detection
  • Data Tools: Pandas, NumPy, Polars, SQL databases
  • Automation: Scripting, scheduling, orchestration workflows
  • Process Tools: Git, Jupyter, SOPs, documentation
  • Cloud Platforms: AWS (S3, data storage)
  • Nice-to-Have: MLflow, Ray, Dagster, Airflow, Apache Iceberg
Qualifications


  • Bachelor's degree in Engineering, Computer Science, Mathematics, Statistics, Data Science, or equivalent
Bonus Points


  • Experience scaling ML production from low volume to high volume (10x+ growth)
  • Familiarity with industrial IoT, sensor data, or time-series data
  • Experience managing both data engineering and ML operations teams
  • Client data onboarding from external/enterprise sources (not just internal datasets)
  • Track record building operational automation that reduces manual work 40%+
  • Hands-on experience with distributed ML systems (Ray, Spark)


What Success Looks Like


First 30 Days: Shadow current workflows, map every operational task the Director handles, begin handling daily triage independently.


First 90 Days: Fully own 100% of operational workload. Director's operational time drops from 40% to <15%. Establish SLAs and tracking for all requests.


First 6 Months: Operational manual burden reduced by 40%+. Team scaled to 4-5 with clear SOPs for every core workflow. ML model production visibly on trajectory toward 30+ daily.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.