Login Sign Up
🔔 FCM Loaded

MLOps Lead - Enterprise AI Platform

Rakuten Symphony

5 - 10 years

Bengaluru

Posted: 18/03/2026

Getting a referral is 5x more effective than applying directly

Job Description

Job Title: MLOps Lead - Enterprise AI Platform (10 Yrs )

Location: Bangalore (Hybrid)


Why should you choose us?

Rakuten Symphony is a Rakuten Group company, that provides global B2B services for the mobile telco industry and enables next-generation, cloud-based, international mobile services. Building on the technology Rakuten used to launch Japans newest mobile network, we are taking our mobile offering global. To support our ambitions to provide an innovative cloud-native telco platform for our customers, Rakuten Symphony is looking to recruit and develop top talent from around the globe. We are looking for individuals to join our team across all functional areas of our business from sales to engineering, support functions to product development. Lets build the future of mobile telecommunications together!


What Do We Expect From You

We are seeking a visionary and highly experienced MLOps Eng. to lead the design, development, and implementation of our enterprise-grade MLOps platform on Kubernetes. This pivotal role requires an individual with deep expertise across the entire ML lifecycle, from data ingestion and feature engineering to model training, deployment, monitoring, and governance. The MLOps Lead will be responsible for architecting a scalable, secure, multi-tenant, and compliant platform that empowers our data scientists and becomes a core product offering for our customers. This individual will act as the technical lead, guiding cross-functional teams (DevOps, Data Science, Security) and setting the strategic direction for our MLOps ecosystem.


Responsibilities:

1. Strategic Platform Architecture:

  • Lead the architectural vision, design, and continuous evolution of the Platform, ensuring alignment with business objectives, security standards, and scalability requirements.
  • Drive the adoption and integration of open-source MLOps tools (Kubeflow, MLflow, Feast, KServe, Alibi-Detect, Evidently AI, Spark, etc.) into a cohesive, production-ready enterprise solution.
  • Define platform standards, best practices, and architectural patterns for MLOps development and operations.

2. Technical Leadership & Implementation Oversight:

  • Act as the primary technical authority and lead for the MLOps initiative, guiding both DevOps/Platform and MLOps/Data Science teams through the phased development plan.
  • Oversee the implementation of core platform components, ensuring robust integration, performance, and adherence to architectural blueprints.
  • Provide expert guidance on Kubernetes-native MLOps practices, distributed computing for ML (Spark, Kubeflow Training Operators), and model serving strategies (KServe).


3. Enterprise Security, Governance & Multi-Tenancy:

  • Architect and oversee the implementation of enterprise-grade security features including SSO (Keycloak), secrets management (HashiCorp Vault), and fine-grained access control (Kubernetes RBAC, OPA Gatekeeper) for data and platform resources.
  • Design and enforce multi-tenancy models that provide strong isolation, resource governance, and secure data access for internal teams and external customers.
  • Ensure the platform meets stringent compliance requirements through comprehensive audit logging, tracing (Fluentd, ELK/OpenSearch, Prometheus/Grafana), and data lineage considerations.

4. ML Lifecycle & Data Management Expertise:

  • Architect and integrate a robust Feature Store (tool like Feast) for consistent feature engineering, management, and serving across training and inference.
  • Lead the integration of MLflow for experiment tracking, model versioning, and a centralized model registry.
  • Design and implement comprehensive model monitoring solutions, including data drift and model quality detection (Alibi-Detect/Evidently AI), with integrated alerting.

5. Developer Experience & Customization:

  • Champion the developer experience for data scientists, ensuring ease of use, self-service capabilities, and efficient workflows (e.g., automated namespace provisioning, notebook environment management).
  • Provide architectural guidance for building a custom, branded UI layer on top of the open-source components, enhancing usability and aligning with product offerings.

6. Collaboration & Mentorship:

  • Collaborate extensively with Data Science, DevOps, Security, Product Management, and Business stakeholders to gather requirements, communicate technical vision, and drive platform adoption.
  • Mentor and upskill engineering teams in MLOps best practices, cloud-native development, and advanced ML techniques.

Required Skills & Expertise:


  • 10+ years of progressive experience in software engineering, data engineering, or MLOps, with at least 5 years in a lead or architect role focused on building and managing production of large-scale ML platforms.
  • Expert-level proficiency with Kubernetes and its ecosystem (operators, CRDs, Helm, networking, storage).
  • Experience in building/managing ML platform tools such as MLflow , Kubeflow, Airflow, SageMaker, Vertex AI, or Azure Machine Learning.
  • Deep hands-on experience with Kubeflow (Pipelines, Notebooks, Training Operators, KServe) in production environments.
  • Extensive experience with MLflow for experiment tracking, model registry, and model lifecycle management.
  • Proven expertise in designing and implementing Feature Stores (e.g., Feast) for both online and offline serving.
  • Strong background in distributed data processing technologies like Apache Spark/PySpark, especially on Kubernetes.
  • Architectural experience with enterprise security solutions including SSO (Keycloak, OAuth/OIDC), secrets management (HashiCorp Vault), and policy enforcement (Kubernetes RBAC, OPA Gatekeeper).
  • Demonstrated ability to implement comprehensive monitoring and observability stacks (Prometheus, Grafana, ELK/OpenSearch, Fluentd, Jaeger) for platform health and ML model performance/drift (Alibi-Detect, Evidently AI).
  • Proficiency in Python and experience with major ML/Deep Learning frameworks (TensorFlow, PyTorch, Scikit-learn).
  • Experience with cloud-native storage solutions (e.g., MinIO, S3, GCS) and open table formats (Iceberg, Delta Lake).
  • Excellent communication, leadership, and interpersonal skills with the ability to influence technical direction and drive complex initiatives across multiple teams.


Preferred Qualifications:

  • Experience building custom web UIs or API layers for platform products.
  • Active contributor to open-source MLOps projects.
  • Familiarity with data governance, lineage tools, and MLOps compliance frameworks.
  • Experience with GitOps methodologies (ArgoCD, Flux CD).


Rakuten Shugi Principles:

Our worldwide practices describe specific behaviours that make Rakuten unique and united across the world. We expect Rakuten employees to model these 5 Shugi Principles of Success.

  • Always improve, always advance. Only be satisfied with complete success - Kaizen.
  • Be passionately professional. Take an uncompromising approach to your work and be determined to be the best.
  • Hypothesize - Practice - Validate - Shikumika. Use the Rakuten Cycle to success in unknown territory.
  • Maximize Customer Satisfaction. The greatest satisfaction for workers in a service industry is to see their customers smile.
  • Speed!! Speed!! Speed!! Always be conscious of time. Take charge, set clear goals, and engage your team.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.