Senior MLOps Engineer

Production AI Infrastructure, LLM Operations & Evaluation

Mumbai (On-site) | Full-time | 5-7 years

About the role

Unico Connect is an AI-first technology partner that builds custom mobile, web, and AI products for clients across multiple geographies. We are hiring a Senior MLOps Engineer who will own the production AI infrastructure across customer engagements, from model serving and prompt management to evaluation pipelines, observability, and unit economics.

The mandatory requirement for this role is hands-on production ownership of LLM or ML systems at meaningful scale, with at least 2 years of focused work on inference infrastructure, serving stacks, or production AI pipelines. The role is hands-on and architectural. Expect to design model serving topologies, build evaluation harnesses, harden inference pipelines, drive cost discipline, and partner closely with AI engineers, backend engineers, and DevOps so that AI features ship predictably. A typical week includes a serving cost review on an active workload, an eval pipeline build for a new feature, an incident debrief on a latency spike, and an architecture review on a new agent system.

Responsibilities

Model serving and retrieval infrastructure: Design and operate LLM serving stacks (vLLM, TGI, Triton, TorchServe, Ray Serve, or managed equivalents) and vector databases (Pinecone, Weaviate, Qdrant, pgvector, Milvus). Own decisions on batching, quantisation, caching, GPU sizing, and multi-tenant isolation.
Evaluation pipelines and quality gates: Build evaluation harnesses for AI features covering accuracy, hallucination, regression, latency, and cost. Run these as part of CI so quality is enforced before any change ships.
Prompt and model lifecycle management: Own prompt registries, versioning, model routing, A/B testing of model and prompt variants, and rollback paths. Treat prompts and model selections as production artifacts with the same rigour as code.
Observability and tracing: Instrument AI workflows with LangSmith, Langfuse, OpenTelemetry, Prometheus, and Grafana. Define SLOs for AI features and lead incident response when they are breached.
Cost and unit economics: Own per-request and per-tenant cost for AI features. Drive measurable savings through batching, prompt caching, smaller-model routing, and inference optimisation.
AI CI/CD and reproducibility: Build pipelines for model artifacts, fine-tuned weights, prompt templates, and eval suites. Ensure builds are reproducible across environments and tenants.
Architecture leadership: Set the architectural direction of the production AI stack across engagements. Document trade-offs and defend decisions in writing and in design review.
AI-assisted engineering discipline: Use Claude, Cursor, and similar tools day to day for infrastructure code, scripts, and pipelines. Set the team standard for safe use, review, and verification of AI-generated infrastructure and code.
Mentorship and enablement: Mentor mid-level and junior engineers across AI and DevOps. Run internal sessions on production AI patterns, eval design, and cost discipline.

Requirements

Hands-on production ownership of LLM or ML systems at scale (mandatory). Must have personally shipped and operated at least one LLM or ML system in production for a real workload, with operational responsibility including oncall, incident response, and scaling decisions. POCs and lab work do not qualify.
5+ years of overall engineering experience with at least 2 years in MLOps, LLM operations, or production AI infrastructure. Candidates with adjacent DevOps or backend backgrounds and a clear pivot into production AI work qualify.
AWS depth. Hands-on production experience with EC2, S3, IAM, VPC, ECR, and EKS or ECS. Working knowledge of at least one of SageMaker, Bedrock, or equivalent AI services. Comfort working across regions and managing cost.
Python proficiency for production infrastructure code. Strong with FastAPI or equivalent for service code, async patterns, packaging, testing, and dependency management. Production-grade Python, not notebook code.
Container and orchestration depth. Production experience with Docker and Kubernetes (EKS preferred). Familiarity with GPU scheduling, Helm, and operator patterns.
Inference and serving tooling. Hands-on with at least two of vLLM, TGI, Triton, TorchServe, Ray Serve, Ollama. Working knowledge of quantisation (GGUF, AWQ, GPTQ), batching, and caching.
Evaluation and observability tooling. Hands-on with at least one of LangSmith, Langfuse, Phoenix, Ragas, Promptfoo, DeepEval. Comfortable instrumenting with OpenTelemetry, Prometheus, and Grafana.
CI/CD and infrastructure as code. Strong with GitHub Actions or GitLab CI and Terraform or Pulumi. Production experience deploying ML and AI workloads through automated pipelines.
AI-assisted engineering experience. Daily use of Claude, Cursor, Copilot, or equivalent in production work. Strong instinct for reviewing, testing, and validating AI-generated code and infrastructure before it ships.
Excellent written and spoken English, with the ability to own and justify decisions. Confident in client and senior-stakeholder conversations. Comfortable defending architectural choices in writing and in design review.

Nice to have: fine-tuning experience (LoRA, QLoRA, PEFT, full fine-tune); distributed training (DeepSpeed, FSDP, Ray); MLflow, Weights and Biases, or Comet; agent infrastructure (LangGraph, AutoGen, CrewAI); multi-tenant AI platform experience; relevant AWS or ML certifications.

Senior MLOps Engineer

Unico Connect

Job Description

Services you might be interested in

We Search & Apply Jobs for You!