Senior Machine Learning Engineer
Quantalent AI
5 - 10 years
Bengaluru
Posted: 03/06/2026
Job Description
This role is with one of our client, which is an AI-Powered Revenue Cycle Intelligence Platform, transforming the healthcare billing stack with autonomous medical coding, proactive denial prevention, and workflow automation solutions.
Location: Bengaluru
Experience: 2-7yrs
Role: ML Engineer / Senior AI/ML Engineer
Mode: Work from Office
Key requirement: Candidate must be only from IIT
Job Overview
We are hiring a ML Engineer / Senior AI/ML Engineer to own the end-to end applied LLM, retrieval, and evaluation layer of our healthcare AI platform. You will build production systems that automate mid- and end revenue cycle workflows for US healthcare spanning coding, claim edits, denials triage, appeal generation, and payer-rule reasoning. This is a production engineering role (not research) focused on building scalable, auditable, and cost-efficient LLM systems in a regulated healthcare environment.
What Youll Own
1. Self-Hosted LLM Infrastructure
Deploy, fine-tune, and operate open-source models (Llama, Qwen, MedGemma, and successors) as our primary inference stack
Work with vLLM / SGLang / TensorRT-LLM for serving at scale, with disciplined attention to throughput, tail latency, batching, KV-cache, and GPU economics
Own fine-tuning workflows end-to-end (SFT, LoRA, QLoRA, DPO) on clinical notes, claims, and payer-rule data
Optimize GPU usage, latency, batching, and cost; make build-vs-buy and hosted-vs-self-hosted trade-offs explicit and measured
2. Knowledge Graphs & Embedding-Based Retrieval
Design and maintain the knowledge graph encoding ICD-10-CM, CPT, HCPCS, modifiers, HCC, NCCI edits, LCD/NCD policies, and payer specific rules and the relationships between them
Build embedding-based retrieval over clinical notes, historical claims, denial reasons, and payer-policy corpora including chunking, embedding model selection, hybrid search, and reranking
Combine graph traversal and dense retrieval so every coded line, scrubbed edit, and appeal response is grounded in auditable evidence
Own ingestion, versioning, and quality of underlying knowledge sources (CMS, AHA, AMA, NCCI, payer bulletins)
3. Evaluation & Monitoring
Build continuous evaluation pipelines that gate every model, prompt, retrieval, and graph change before production
Run offline eval suites grounded in coder- and biller-validated labels; use LLM-as-judge where appropriate, calibrated against human ground truth
Monitor drift, hallucinations, regressions, and output quality in production; operate shadow-mode rollouts and per-cohort accuracy tracking (specialty, payer, chart type)
Track business metrics: chart-level and opportunity-level coding accuracy, denial rate impact, clean-claim rate, cost per chart, and end-to-end latency
4. LLM Systems & Prompt Engineering
Design prompts and context pipelines for coding (CPT, ICD, HCC, E/M), claim edits, denial classification, and appeal drafting
Implement structured outputs (JSON, function calling, constrained decoding) on top of the self-hosted stack
Apply RAG over medical coding standards (CMS, ICD-10, AHA, NCCI) and payer policies, grounded in the knowledge graph and embedding stores
Treat prompts as a thin, well-versioned, well-evaluated layer never the load-bearing piece
5. Agentic Workflows & Tooling MCP
Build MCP servers for internal tools: code lookup, NCCI / rule checks, payer logic, eligibility, denial classification
Design multi-step agent workflows with audit trails and human-in-the-loop checkpoints for coder, biller, and AR-analyst review
Define deterministic vs. LLM-based tool boundaries for reliability reliability comes from knowing which is which
What Were Looking For
Must-Have
5+ years in ML/AI engineering, including 6+ months in production LLM systems
Hands-on experience deploying and operating self-hosted LLMs (vLLM, SGLang, TensorRT-LLM, or equivalent)
Strong experience designing embedding-based retrieval and/or knowledge graphs for grounded LLM applications
Demonstrated ownership of evaluation infrastructure offline benchmarks, online monitoring, drift and regression detection
Strong Python + PyTorch + Hugging Face experience
Production experience with monitoring, incidents, and system ownership
Strongly Preferred
Fine-tuning experience (SFT, LoRA, QLoRA, DPO) on domain-specific corpora
Experience with graph databases (Neo4j, ArangoDB, or equivalent) and graph-aware retrieval
Experience with vector databases and hybrid search (BM25 + dense, rerankers)
Familiarity with LLM observability tools (Langfuse, LangSmith, Arize, Braintrust, or in-house equivalents)
Exposure to healthcare, RCM, claims, or other regulated domains
Experience with MCP or similar tool-orchestration frameworks
Strong prompt-engineering and LLM-evaluation instincts
What We Offer
Work on high-impact healthcare AI systems used in real billing and RCM workflows
Ownership of production LLM, retrieval, and evaluation systems end-to-end
Solve real-world problems with real constraints (cost, latency, compliance, auditability)
Services you might be interested in
We Search & Apply Jobs for You!
Our team scans through 1000s of opportunities and applies to roles best suited to your profile
Save 100+ hours and focus on what matters - cracking interviews and landing offers.
