AI/ML Ops Engineer (GPU Acceleration & AI Inference)
HireAlpha
2 - 5 years
Bengaluru
Posted: 22/02/2026
Getting a referral is 5x more effective than applying directly
Job Description
Role: AI/ML Ops Engineer (GPU Acceleration & AI Inference)
Location: Offshore Bangalore (BCIT)
Experience: 5+ Years / 7+ Years
We are looking for passionate AI/ML Ops Engineers to build and scale enterprise-grade AI platforms with a strong focus on GPU acceleration, inference optimization, and GenAI/LLM deployment.
Key Responsibilities
- Build and maintain containerized AI applications using Red Hat OpenShift, Kubernetes, and Helm.
- Deploy and optimize inference engines like NVIDIA Triton Inference Server and vLLM.
- Accelerate AI workloads using GPU optimization techniques (TensorRT/ONNX).
- Lead model deployment, lifecycle management, and monitoring in production.
- Implement observability using Prometheus and Grafana.
- Automate CI/CD pipelines using Jenkins, Terraform, Ansible, and Groovy.
- Develop automation tools using Python.
- Architect and deploy AI/ML platforms on Amazon Web Services (SageMaker & Bedrock knowledge is a plus).
- Contribute to GenAI, LLM, and Agentic AI initiatives.
- Build scalable, high-performance, and resilient AI platforms (on-prem & cloud).
Primary Skills
- AI/ML Ops & GPU Acceleration
- Production Model Deployment
- Kubernetes & OpenShift
- AWS Cloud Architecture
Secondary Skills (1+ year or strong knowledge)
- AI Inference optimization
- NVIDIA TensorRT
- ONNX
- Triton / vLLM
Services you might be interested in
We Search & Apply Jobs for You!
Our team scans through 1000s of opportunities and applies to roles best suited to your profile
Save 100+ hours and focus on what matters - cracking interviews and landing offers.
