AI Inference Engineer
G82 labs pvt ltd
2 - 5 years
Chennai
Posted: 10/01/2026
Job Description
Inference & LLM Performance Engineer
Experience: 5+ years
About the Role
We are deploying large-scale LLM inference inside confidential environments with strict latency, throughput, and streaming requirements.
As Head of Inference & LLM Performance, you will be working with everything related to model execution speed, GPU efficiency, while working closely with the other teams to ensure security constraints are respected.
What Youll work with
* LLM inference stack (vLLM, TensorRT-LLM, or equivalent)
* Hugging Face model loading and integrity verification
* Token streaming semantics and batching strategies
* GPU scheduling and throughput optimization
* Latency, memory, and utilization targets
What Youll Build
* High-performance inference pipelines for large models
* Encrypted token streaming with minimal overhead
* Efficient batching across hundreds of concurrent users
* Production-grade inference behavior under TEE constraints
Ideal Background
* Deep experience with LLM inference (vLLM, etc.)
* Strong CUDA and GPU performance understanding
* Experience running large models in production
* Comfortable working under strict security constraints
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
