Machine Learning Engineer
Valiance Solutions
2 - 5 years
Bengaluru
Posted: 26/02/2026
Job Description
About Valiance
Valiance is a deeptech AI company building sovereign and mission-critical AI solutions for enterprises, public sector, and government institutions. From predictive maintenance and demand planning to sovereign AI for citizen services, we design systems that thrive in high-stakes environments. Recognized with the NASSCOM AI Game Changers Award and the Aegis Graham Bell Award, and a certified Google Cloud Partner, our 200+ engineers and data scientists are shaping the future of industries and societies through responsible AI.
The Role
We are looking for a senior LLMOps Engineer who has taken LLM inference optimization from idea to production not just proof of concept. You will own the end-to-end efficiency of our LLM inference infrastructure running on H200 GPUs, driving down cost and latency while maintaining the reliability our enterprise and government clients demand. This is a high-ownership, high-impact role on a team building some of India's most consequential AI systems.
What You Will Do
- Design and operate production-grade LLM inference pipelines on H200 GPU clusters, optimizing for throughput, latency, and cost per token.
- Evaluate and deploy small-to-medium open-source LLMs (e.g., Mistral, Llama, Phi, Gemma) as cost-efficient alternatives to large models without sacrificing output quality.
- Tune and manage vLLM deployments including continuous batching, paged attention, tensor parallelism, and quantization (GPTQ, AWQ, FP8) in production environments.
- Build and maintain model-serving APIs with robust observability: latency percentiles, GPU utilization, queue depths, and cost-per-request dashboards.
- Architect Kubernetes-based autoscaling strategies for inference workloads, balancing cold-start penalties against cost at scale.
- Run structured A/B experiments comparing model variants, quantization levels, and batching strategies using production traffic not synthetic benchmarks.
- Collaborate with applied ML engineers and solution architects to identify latency and cost bottlenecks across the model serving stack.
- Establish and enforce SLOs for inference reliability, and build alerting and runbooks for production incidents.
What We Are Looking For
Non-Negotiables
- 3+ years of hands-on experience operating LLM inference in production demonstrable cost and latency improvements, not POC results.
- Deep expertise with vLLM in production: batching strategies, memory management, quantization tradeoffs.
- Strong Python engineering skills clean, testable, production-ready code.
- Proficiency with Docker and Kubernetes for deploying and scaling GPU inference workloads.
- Experience building and maintaining REST/gRPC APIs for model serving at scale.
- Hands-on experience with open-source LLMs and the ability to evaluate model-quality vs. cost tradeoffs for real use cases.
Strong Advantages
- Experience with GPU memory profiling and optimization (CUDA-level awareness a plus).
- Familiarity with model distillation, speculative decoding, or flash attention implementations.
- Exposure to multi-GPU and multi-node inference setups.
- Experience with inference frameworks beyond vLLM: TGI, TensorRT-LLM, Triton Inference Server.
- Familiarity with sovereign AI or air-gapped deployment constraints.
Why Valiance
- You will work on AI systems that are actually deployed at scale used by government institutions and large enterprises, not just demoed.
- Direct access to H200 infrastructure with meaningful compute budgets no GPU rationing.
- A culture that rewards engineering depth and production ownership over slide decks.
- Competitive compensation with performance-linked incentives.
- Opportunity to define how Valiance builds its AI platform as we scale.
How to Apply
Upload your resume and a brief note on a specific inference optimization you shipped in production the problem, your approach, and the measurable outcome. We do not conduct screening rounds for this role. Shortlisted candidates will move directly to a technical discussion with our engineering leadership.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
