Role: AI/ML Ops Engineer (GPU Acceleration & AI Inference)

Location: Offshore Bangalore (BCIT)

Experience: 5+ Years / 7+ Years

We are looking for passionate AI/ML Ops Engineers to build and scale enterprise-grade AI platforms with a strong focus on GPU acceleration, inference optimization, and GenAI/LLM deployment.

Key Responsibilities

Build and maintain containerized AI applications using Red Hat OpenShift, Kubernetes, and Helm.
Deploy and optimize inference engines like NVIDIA Triton Inference Server and vLLM.
Accelerate AI workloads using GPU optimization techniques (TensorRT/ONNX).
Lead model deployment, lifecycle management, and monitoring in production.
Implement observability using Prometheus and Grafana.
Automate CI/CD pipelines using Jenkins, Terraform, Ansible, and Groovy.
Develop automation tools using Python.
Architect and deploy AI/ML platforms on Amazon Web Services (SageMaker & Bedrock knowledge is a plus).
Contribute to GenAI, LLM, and Agentic AI initiatives.
Build scalable, high-performance, and resilient AI platforms (on-prem & cloud).

Primary Skills

AI/ML Ops & GPU Acceleration
Production Model Deployment
Kubernetes & OpenShift
AWS Cloud Architecture

Secondary Skills (1+ year or strong knowledge)

AI Inference optimization
NVIDIA TensorRT
ONNX
Triton / vLLM

AI/ML Ops Engineer (GPU Acceleration & AI Inference)

HireAlpha

Job Description

Services you might be interested in

Improve Your Resume Today