🔔 FCM Loaded

Machine Learning Engineer

Valiance Solutions

2 - 5 years

Bengaluru

Posted: 26/02/2026

Getting a referral is 5x more effective than applying directly

Job Description

About Valiance

Valiance is a deeptech AI company building sovereign and mission-critical AI solutions for enterprises, public sector, and government institutions. From predictive maintenance and demand planning to sovereign AI for citizen services, we design systems that thrive in high-stakes environments. Recognized with the NASSCOM AI Game Changers Award and the Aegis Graham Bell Award, and a certified Google Cloud Partner, our 200+ engineers and data scientists are shaping the future of industries and societies through responsible AI.


The Role

We are looking for a senior LLMOps Engineer who has taken LLM inference optimization from idea to production not just proof of concept. You will own the end-to-end efficiency of our LLM inference infrastructure running on H200 GPUs, driving down cost and latency while maintaining the reliability our enterprise and government clients demand. This is a high-ownership, high-impact role on a team building some of India's most consequential AI systems.


What You Will Do

  • Design and operate production-grade LLM inference pipelines on H200 GPU clusters, optimizing for throughput, latency, and cost per token.
  • Evaluate and deploy small-to-medium open-source LLMs (e.g., Mistral, Llama, Phi, Gemma) as cost-efficient alternatives to large models without sacrificing output quality.
  • Tune and manage vLLM deployments including continuous batching, paged attention, tensor parallelism, and quantization (GPTQ, AWQ, FP8) in production environments.
  • Build and maintain model-serving APIs with robust observability: latency percentiles, GPU utilization, queue depths, and cost-per-request dashboards.
  • Architect Kubernetes-based autoscaling strategies for inference workloads, balancing cold-start penalties against cost at scale.
  • Run structured A/B experiments comparing model variants, quantization levels, and batching strategies using production traffic not synthetic benchmarks.
  • Collaborate with applied ML engineers and solution architects to identify latency and cost bottlenecks across the model serving stack.
  • Establish and enforce SLOs for inference reliability, and build alerting and runbooks for production incidents.


What We Are Looking For

Non-Negotiables

  • 3+ years of hands-on experience operating LLM inference in production demonstrable cost and latency improvements, not POC results.
  • Deep expertise with vLLM in production: batching strategies, memory management, quantization tradeoffs.
  • Strong Python engineering skills clean, testable, production-ready code.
  • Proficiency with Docker and Kubernetes for deploying and scaling GPU inference workloads.
  • Experience building and maintaining REST/gRPC APIs for model serving at scale.
  • Hands-on experience with open-source LLMs and the ability to evaluate model-quality vs. cost tradeoffs for real use cases.


Strong Advantages

  • Experience with GPU memory profiling and optimization (CUDA-level awareness a plus).
  • Familiarity with model distillation, speculative decoding, or flash attention implementations.
  • Exposure to multi-GPU and multi-node inference setups.
  • Experience with inference frameworks beyond vLLM: TGI, TensorRT-LLM, Triton Inference Server.
  • Familiarity with sovereign AI or air-gapped deployment constraints.


Why Valiance

  • You will work on AI systems that are actually deployed at scale used by government institutions and large enterprises, not just demoed.
  • Direct access to H200 infrastructure with meaningful compute budgets no GPU rationing.
  • A culture that rewards engineering depth and production ownership over slide decks.
  • Competitive compensation with performance-linked incentives.
  • Opportunity to define how Valiance builds its AI platform as we scale.

How to Apply

Upload your resume and a brief note on a specific inference optimization you shipped in production the problem, your approach, and the measurable outcome. We do not conduct screening rounds for this role. Shortlisted candidates will move directly to a technical discussion with our engineering leadership.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.