AI/ML Ops Engineer (GPU Acceleration & AI Inference)
HireAlpha
2 - 5 years
Bengaluru
Posted: 22/02/2026
Getting a referral is 5x more effective than applying directly
Job Description
Role: AI/ML Ops Engineer (GPU Acceleration & AI Inference)
Location: Offshore Bangalore (BCIT)
Experience: 5+ Years / 7+ Years
We are looking for passionate AI/ML Ops Engineers to build and scale enterprise-grade AI platforms with a strong focus on GPU acceleration, inference optimization, and GenAI/LLM deployment.
Key Responsibilities
- Build and maintain containerized AI applications using Red Hat OpenShift, Kubernetes, and Helm.
- Deploy and optimize inference engines like NVIDIA Triton Inference Server and vLLM.
- Accelerate AI workloads using GPU optimization techniques (TensorRT/ONNX).
- Lead model deployment, lifecycle management, and monitoring in production.
- Implement observability using Prometheus and Grafana.
- Automate CI/CD pipelines using Jenkins, Terraform, Ansible, and Groovy.
- Develop automation tools using Python.
- Architect and deploy AI/ML platforms on Amazon Web Services (SageMaker & Bedrock knowledge is a plus).
- Contribute to GenAI, LLM, and Agentic AI initiatives.
- Build scalable, high-performance, and resilient AI platforms (on-prem & cloud).
Primary Skills
- AI/ML Ops & GPU Acceleration
- Production Model Deployment
- Kubernetes & OpenShift
- AWS Cloud Architecture
Secondary Skills (1+ year or strong knowledge)
- AI Inference optimization
- NVIDIA TensorRT
- ONNX
- Triton / vLLM
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
