ML platform Engineer ( AI control plane)
Talentiser
2 - 5 years
Bengaluru
Posted: 03/06/2026
Job Description
ML Platform (AI Platform Control Plane)
Team: AI Platform Engineering
About the company Platform,
We are building a next-generation AI platform to power intelligent, AI-driven
experiences across our global marketplace. The platform's control plane exposes the full ML
lifecycle through the AI Hub developer portalserving ML researchers, Applied Scientists, and
Data Engineers across global organization with reliable, self-service tooling for
experimentation, model management, and production deployment.
We focus on building the ML Platform Control Planecomposed of the AI Metadata Service,
Model Management System (MMS), Experiment Management System (EMS), and Deployment
Serviceas well as developer-facing tooling including a Python SDK for the AI platform, Jupyter
and Ray Workspace environments, the AI Hub portal built on React and Node.js, and production
observability via standardized AI runtime metrics and monitoring dashboards.
About the Role
We are looking for an experienced Software Engineer specializing in AI Platform infrastructure and MLOps services to design, build, and operate the control plane that ties together the entire ML ecosystem. This is a high-impact, full-stack platform role where you will own both core
backend MLOps services and the developer-facing AI Hub interfaceensuring every ML
practitioner at eBay has reliable, efficient, and intuitive tools to build AI at scale.
You will work on ML Platform Control Plane services (AI Metadata Service, MMS, EMS,
Deployment Service), the Experiment Management System built on MLflow, the Model
Management System with Python SDK integration, AI Metadata Service, Ray Workspace and
JupyterHub notebook infrastructure, distributed tracing and observability across platform
services, the AI Hub portal built on React and Node.js, and production monitoring
dashboardsall integrated with GitOps-based CI/CD pipelines.
Key Responsibilities
Design and build the ML Platform Control Plane services, including the AI Metadata
Service, Management Service, and Deployment Service.
Develop and operate the Experiment Management System (EMS) built on MLflow for
experiment tracking, metrics, artifacts, and lifecycle governance.
Build and maintain the Model Management System (MMS), including model versioning,
lineage tracking, stage transitions, and deployment gating.
Design and operate the AI Metadata Service to store and serve metadata across
experiments, model versions, training runs, datasets, and ML pipelines.
Build and manage AI Workspace environments, including JupyterHub and Ray
Workspaces on Kubernetes.
Implement distributed tracing and observability across ML Platform services using tools
such as OpenTelemetry and Jaeger.
Design and build the AI Hub portal using React and Node.js.
Develop and maintain the Python SDK for the AI platform.
Build and maintain production monitoring and dashboards using Prometheus and
Grafana.
Build and operate CI/CD pipelines for ML workflows and platform services using Argo
CD and GitOps-based tooling.
Collaborate with ML researchers, Applied Scientists, and Data Engineers to improve
developer workflows and platform usability.
Improve reliability, scalability, and developer experience across ML Platform control
plane services.
What Were Looking For
Bachelors or Masters degree in Computer Science, Engineering, or a related field.
5+ years of experience building scalable distributed systems or platform engineering
solutions.
Strong programming skills in Python and/or Java.
Proficiency in TypeScript and JavaScript for React and Node.js development.
Hands-on experience with MLOps services such as MLflow, Weights & Biases, or
equivalent systems.
Experience designing and operating model management systems with versioning,
lineage, and approval workflows.
Experience building metadata services and scalable data stores for ML platforms.
Hands-on experience with Jupyter Notebook and Ray Workspace environments.
Experience implementing distributed tracing across microservices and ML platform
components.
Proficiency with React and Node.js for developer-facing web portals and internal tools.
Experience designing and building Python SDKs for platform consumption.
Strong expertise with monitoring and observability tooling such as Prometheus and
Grafana.
Experience with Kubernetes, Docker, and GitOps-based CD tooling such as Argo CD.
Strong API design, debugging, and performance optimization skills.
Services you might be interested in
We Search & Apply Jobs for You!
Our team scans through 1000s of opportunities and applies to roles best suited to your profile
Save 100+ hours and focus on what matters - cracking interviews and landing offers.
