Login Sign Up

Senior Machine Learning Engineer

Quantalent AI

5 - 10 years

Bengaluru

Posted: 03/06/2026

Getting a referral is 5x more effective than applying directly

Job Description

This role is with one of our client, which is an AI-Powered Revenue Cycle Intelligence Platform, transforming the healthcare billing stack with autonomous medical coding, proactive denial prevention, and workflow automation solutions.


Location: Bengaluru

Experience: 2-7yrs

Role: ML Engineer / Senior AI/ML Engineer

Mode: Work from Office

Key requirement: Candidate must be only from IIT


Job Overview

We are hiring a ML Engineer / Senior AI/ML Engineer to own the end-to end applied LLM, retrieval, and evaluation layer of our healthcare AI platform. You will build production systems that automate mid- and end revenue cycle workflows for US healthcare spanning coding, claim edits, denials triage, appeal generation, and payer-rule reasoning. This is a production engineering role (not research) focused on building scalable, auditable, and cost-efficient LLM systems in a regulated healthcare environment.


What Youll Own

1. Self-Hosted LLM Infrastructure

Deploy, fine-tune, and operate open-source models (Llama, Qwen, MedGemma, and successors) as our primary inference stack

Work with vLLM / SGLang / TensorRT-LLM for serving at scale, with disciplined attention to throughput, tail latency, batching, KV-cache, and GPU economics

Own fine-tuning workflows end-to-end (SFT, LoRA, QLoRA, DPO) on clinical notes, claims, and payer-rule data

Optimize GPU usage, latency, batching, and cost; make build-vs-buy and hosted-vs-self-hosted trade-offs explicit and measured


2. Knowledge Graphs & Embedding-Based Retrieval

Design and maintain the knowledge graph encoding ICD-10-CM, CPT, HCPCS, modifiers, HCC, NCCI edits, LCD/NCD policies, and payer specific rules and the relationships between them

Build embedding-based retrieval over clinical notes, historical claims, denial reasons, and payer-policy corpora including chunking, embedding model selection, hybrid search, and reranking

Combine graph traversal and dense retrieval so every coded line, scrubbed edit, and appeal response is grounded in auditable evidence

Own ingestion, versioning, and quality of underlying knowledge sources (CMS, AHA, AMA, NCCI, payer bulletins)


3. Evaluation & Monitoring

Build continuous evaluation pipelines that gate every model, prompt, retrieval, and graph change before production

Run offline eval suites grounded in coder- and biller-validated labels; use LLM-as-judge where appropriate, calibrated against human ground truth

Monitor drift, hallucinations, regressions, and output quality in production; operate shadow-mode rollouts and per-cohort accuracy tracking (specialty, payer, chart type)

Track business metrics: chart-level and opportunity-level coding accuracy, denial rate impact, clean-claim rate, cost per chart, and end-to-end latency


4. LLM Systems & Prompt Engineering

Design prompts and context pipelines for coding (CPT, ICD, HCC, E/M), claim edits, denial classification, and appeal drafting

Implement structured outputs (JSON, function calling, constrained decoding) on top of the self-hosted stack

Apply RAG over medical coding standards (CMS, ICD-10, AHA, NCCI) and payer policies, grounded in the knowledge graph and embedding stores

Treat prompts as a thin, well-versioned, well-evaluated layer never the load-bearing piece


5. Agentic Workflows & Tooling MCP

Build MCP servers for internal tools: code lookup, NCCI / rule checks, payer logic, eligibility, denial classification

Design multi-step agent workflows with audit trails and human-in-the-loop checkpoints for coder, biller, and AR-analyst review

Define deterministic vs. LLM-based tool boundaries for reliability reliability comes from knowing which is which


What Were Looking For

Must-Have

5+ years in ML/AI engineering, including 6+ months in production LLM systems

Hands-on experience deploying and operating self-hosted LLMs (vLLM, SGLang, TensorRT-LLM, or equivalent)

Strong experience designing embedding-based retrieval and/or knowledge graphs for grounded LLM applications

Demonstrated ownership of evaluation infrastructure offline benchmarks, online monitoring, drift and regression detection

Strong Python + PyTorch + Hugging Face experience

Production experience with monitoring, incidents, and system ownership


Strongly Preferred

Fine-tuning experience (SFT, LoRA, QLoRA, DPO) on domain-specific corpora

Experience with graph databases (Neo4j, ArangoDB, or equivalent) and graph-aware retrieval

Experience with vector databases and hybrid search (BM25 + dense, rerankers)

Familiarity with LLM observability tools (Langfuse, LangSmith, Arize, Braintrust, or in-house equivalents)

Exposure to healthcare, RCM, claims, or other regulated domains

Experience with MCP or similar tool-orchestration frameworks

Strong prompt-engineering and LLM-evaluation instincts


What We Offer

Work on high-impact healthcare AI systems used in real billing and RCM workflows

Ownership of production LLM, retrieval, and evaluation systems end-to-end

Solve real-world problems with real constraints (cost, latency, compliance, auditability)

Services you might be interested in

We Search & Apply Jobs for You!

Our team scans through 1000s of opportunities and applies to roles best suited to your profile

Save 100+ hours and focus on what matters - cracking interviews and landing offers.