Login Sign Up

ML platform Engineer ( AI control plane)

Talentiser

2 - 5 years

Bengaluru

Posted: 03/06/2026

Getting a referral is 5x more effective than applying directly

Job Description

ML Platform (AI Platform Control Plane)

Team: AI Platform Engineering

About the company Platform,

We are building a next-generation AI platform to power intelligent, AI-driven

experiences across our global marketplace. The platform's control plane exposes the full ML

lifecycle through the AI Hub developer portalserving ML researchers, Applied Scientists, and

Data Engineers across global organization with reliable, self-service tooling for

experimentation, model management, and production deployment.

We focus on building the ML Platform Control Planecomposed of the AI Metadata Service,

Model Management System (MMS), Experiment Management System (EMS), and Deployment

Serviceas well as developer-facing tooling including a Python SDK for the AI platform, Jupyter

and Ray Workspace environments, the AI Hub portal built on React and Node.js, and production

observability via standardized AI runtime metrics and monitoring dashboards.

About the Role

We are looking for an experienced Software Engineer specializing in AI Platform infrastructure and MLOps services to design, build, and operate the control plane that ties together the entire ML ecosystem. This is a high-impact, full-stack platform role where you will own both core

backend MLOps services and the developer-facing AI Hub interfaceensuring every ML

practitioner at eBay has reliable, efficient, and intuitive tools to build AI at scale.

You will work on ML Platform Control Plane services (AI Metadata Service, MMS, EMS,

Deployment Service), the Experiment Management System built on MLflow, the Model

Management System with Python SDK integration, AI Metadata Service, Ray Workspace and

JupyterHub notebook infrastructure, distributed tracing and observability across platform

services, the AI Hub portal built on React and Node.js, and production monitoring

dashboardsall integrated with GitOps-based CI/CD pipelines.

Key Responsibilities

Design and build the ML Platform Control Plane services, including the AI Metadata

Service, Management Service, and Deployment Service.

Develop and operate the Experiment Management System (EMS) built on MLflow for

experiment tracking, metrics, artifacts, and lifecycle governance.

Build and maintain the Model Management System (MMS), including model versioning,

lineage tracking, stage transitions, and deployment gating.

Design and operate the AI Metadata Service to store and serve metadata across

experiments, model versions, training runs, datasets, and ML pipelines.

Build and manage AI Workspace environments, including JupyterHub and Ray

Workspaces on Kubernetes.

Implement distributed tracing and observability across ML Platform services using tools

such as OpenTelemetry and Jaeger.

Design and build the AI Hub portal using React and Node.js.

Develop and maintain the Python SDK for the AI platform.

Build and maintain production monitoring and dashboards using Prometheus and

Grafana.

Build and operate CI/CD pipelines for ML workflows and platform services using Argo

CD and GitOps-based tooling.

Collaborate with ML researchers, Applied Scientists, and Data Engineers to improve

developer workflows and platform usability.

Improve reliability, scalability, and developer experience across ML Platform control

plane services.

What Were Looking For

Bachelors or Masters degree in Computer Science, Engineering, or a related field.

5+ years of experience building scalable distributed systems or platform engineering

solutions.

Strong programming skills in Python and/or Java.

Proficiency in TypeScript and JavaScript for React and Node.js development.

Hands-on experience with MLOps services such as MLflow, Weights & Biases, or

equivalent systems.

Experience designing and operating model management systems with versioning,

lineage, and approval workflows.

Experience building metadata services and scalable data stores for ML platforms.

Hands-on experience with Jupyter Notebook and Ray Workspace environments.

Experience implementing distributed tracing across microservices and ML platform

components.

Proficiency with React and Node.js for developer-facing web portals and internal tools.

Experience designing and building Python SDKs for platform consumption.

Strong expertise with monitoring and observability tooling such as Prometheus and

Grafana.

Experience with Kubernetes, Docker, and GitOps-based CD tooling such as Argo CD.

Strong API design, debugging, and performance optimization skills.


Services you might be interested in

We Search & Apply Jobs for You!

Our team scans through 1000s of opportunities and applies to roles best suited to your profile

Save 100+ hours and focus on what matters - cracking interviews and landing offers.