🔔 FCM Loaded

AI/ML Ops Engineer (GPU Acceleration & AI Inference)

HireAlpha

2 - 5 years

Bengaluru

Posted: 22/02/2026

Getting a referral is 5x more effective than applying directly

Job Description

Role: AI/ML Ops Engineer (GPU Acceleration & AI Inference)

Location: Offshore Bangalore (BCIT)

Experience: 5+ Years / 7+ Years


We are looking for passionate AI/ML Ops Engineers to build and scale enterprise-grade AI platforms with a strong focus on GPU acceleration, inference optimization, and GenAI/LLM deployment.


Key Responsibilities

  • Build and maintain containerized AI applications using Red Hat OpenShift, Kubernetes, and Helm.
  • Deploy and optimize inference engines like NVIDIA Triton Inference Server and vLLM.
  • Accelerate AI workloads using GPU optimization techniques (TensorRT/ONNX).
  • Lead model deployment, lifecycle management, and monitoring in production.
  • Implement observability using Prometheus and Grafana.
  • Automate CI/CD pipelines using Jenkins, Terraform, Ansible, and Groovy.
  • Develop automation tools using Python.
  • Architect and deploy AI/ML platforms on Amazon Web Services (SageMaker & Bedrock knowledge is a plus).
  • Contribute to GenAI, LLM, and Agentic AI initiatives.
  • Build scalable, high-performance, and resilient AI platforms (on-prem & cloud).


Primary Skills

  • AI/ML Ops & GPU Acceleration
  • Production Model Deployment
  • Kubernetes & OpenShift
  • AWS Cloud Architecture


Secondary Skills (1+ year or strong knowledge)

  • AI Inference optimization
  • NVIDIA TensorRT
  • ONNX
  • Triton / vLLM

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.