Machine Learning Engineer
THG Ingenuity
2 - 5 years
Pune City
Posted: 28/04/2026
Job Description
About THG Ingenuity
THG Ingenuity is a fully integrated digital commerce ecosystem, designed to power brands without limits. Our global end-to-end tech platform is comprised of three products: THG Commerce, THG Studios, THG Fulfilment. Each represents a single, unified solution, overcoming challenges and taking brands direct-to-consumer. Our client portfolio includes globally recognised brands such as Coca-Cola, Nestle, Elemis, Homebase, and Proctor & Gamble.
About the Team
You will join one of the largest AI teams in retail and ecommerce, operating at global scale across the full retail stack from onsite product discovery and personalisation, through to demand forecasting, pricing, fraud prevention, and warehouse fulfilment. We work across every modality (tabular, text, image, video) and combine classical techniques with stateoftheart deep learning, NLP, and generative AI to ship solutions that are sustainable, optimised, and commercially valuable.
We are an AIfirst team. That means we dont just build AI products we build with AI. Coding agents are a core part of how our engineers work, and we expect everyone on the team to use them well: to move faster, ship higherquality code, and spend more time on the problems that genuinely require human judgement.
Software Engineers sit at the heart of how our ML estate reaches production. You will partner closely with Machine Learning Engineers, Data Scientists, and Platform teams to turn models and pipelines into reliable, observable, lowlatency services that the wider business depends on every day. The Role As a Software Engineer in the ML team, you will own the systems that take ML artefacts models, features, embeddings, decisioning logic and make them productiongrade. You will design and operate the realtime serving layer, the streaming and event pipelines that feed it, the observability that keeps it healthy, and the internal tooling that lets ML engineers ship safely and quickly.
This is a handson backend engineering role with a strong SRE flavour. You will write productiongrade code, debug latency and reliability issues in live services, carry pager rotations for the systems you build, and raise the engineering bar for how ML is delivered at THG. You will partner with ML Engineers to take researchquality work and make it businesscritical artefacts.
Key Responsibilities
- Productionise ML artefacts. Take models, features, and pipelines from notebooks and offline jobs into reliable, versioned, welltested production services with clear contracts, rollback paths, and ownership.
- Own the realtime serving layer. Design, build, and operate lowlatency inference services (gRPC and REST) on GCP, with explicit latency, throughput, and cost SLOs, autoscaling, graceful degradation, and safe rollout patterns (canary, shadow, A/B).
- Build streaming and event pipelines. Develop eventdriven data and feature plumbing using Pub/Sub, Kafka, and Dataflow/Beam to power realtime features, online/offline parity, and downstream decisioning.
- Build internal tooling and developer experience. Ship the SDKs, CLIs, service templates, and golden paths that let ML engineers go from trained models to deployed service in hours, not weeks with safety, observability, and compliance built in by default.
- Write productionquality code. Ship clean, reliable, faulttolerant, welltested Python (and where appropriate Go or Java) and SQL, and champion best practices including code review, pair programming, TDD, and internal knowledgesharing.
- Partner across the ML lifecycle. Work handinhand with ML Engineers and Data Scientists on feature pipelines, model packaging, evaluation harnesses, A/B and shadow testing, drift detection, and retraining triggers.
- Tackle technical debt. Modernise legacy services, reduce inference latency and cost, harden flaky pipelines, and improve reproducibility, observability, and governance across the ML estate.
- Set technical direction. Contribute to coding standards, the ML platform roadmap, architectural decisions on serving and streaming infrastructure, and mentor junior engineers.
What Were Looking For
Essential
- BSc in Computer Science, Software Engineering, or a related discipline or equivalent practical experience.
- Proven track record as a backend / production software engineer shipping and operating services at scale, with a clear understanding of reliability, performance, and cost tradeoffs.
- Strong foundations in data structures, algorithms, distributed systems, API design, and software architecture.
- Handson experience designing, building, and running realtime services (gRPC and/or REST) with explicit latency SLOs, autoscaling, and safe rollout patterns (canary, shadow, blue/green, A/B).
- Production experience with streaming and eventdriven pipelines using Pub/Sub, Kafka, and Dataflow/Beam (or close equivalents).
- Advanced Python skills for production services, plus fluent SQL. Comfort with at least one additional production language (Go, Java, Scala, or similar) is a strong plus.
- Handson experience with at least one major cloud platform Google Cloud Platform (Cloud Run/Functions, GKE, Pub/Sub, Dataflow, BigQuery, Vertex AI) is strongly preferred.
- Practical experience with containerisation (Docker), orchestration (Kubernetes), and CI/CD pipelines for production services.
- SREstyle ownership: defining SLIs/SLOs and error budgets, instrumenting services with metrics, logs, and traces (Prometheus, Grafana, OpenTelemetry, Cloud Monitoring), carrying oncall, and leading incident response and postmortems.
- Experience building internal tooling, SDKs, CLIs, or service templates that improve developer experience and shorten timetoproduction for other engineers.
- AIfirst mindset and handson experience with coding agents (Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, or similar) as part of your daily workflow. You should be able to describe, with concrete examples, how you use agents to plan, write, test, refactor, and review code and how you manage their limitations.
- Excellent communication and stakeholdermanagement skills, with the ability to collaborate effectively with ML Engineers, Data Scientists, Product Managers, and commercial stakeholders.
Desirable
- Handson experience with ML serving frameworks such as Vertex AI, KServe, Triton Inference Server, TorchServe, BentoML, or Ray Serve.
- Experience integrating with feature stores (Feast, Vertex Feature Store, Tecton) and managing online/offline feature parity.
- MLspecific observability: prediction logging, drift and dataquality monitoring, model performance dashboards, shadow and A/B evaluation harnesses.
- Experience with model registries, experiment tracking (MLflow, Weights & Biases), and CI/CD pipelines tailored to ML workflows.
- Exposure to retail or ecommerce production systems: recommendations, search and ranking, personalisation, demand forecasting, pricing, fraud, or warehouse optimisation.
- Experience with agent frameworks such as Google Agent Development Kit (ADK) and Vertex AI Agent Builder, particularly in productionising agentic systems.
- Internal developer platform, pavedroad, or goldenpath work in a previous role.
- Opensource contributions or a strong public engineering portfolio.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
