AIOps Engineer
Imaging IQ
2 - 5 years
Gurugram
Posted: 20/12/2025
Job Description
Experience: 35 Years
About the CompanyWe aim to bring about a new paradigm in medical image diagnostics intelligent, holistic, ethical, explainable, and patientcentric. Were looking for innovative problemsolvers who empathize with clinicians and patients, understand business problems, and can design and deliver reliable, intelligent products.
Key ResponsibilitiesCI/CD for services & models: Own pipelines (GitHub Actions/GitLab CI), environment gates, artifact/version governance (containers, models, SBOMs), safe rollouts & instant rollbacks.
Kubernetes platform (EKS preferred): Operate multi-env clusters; Helm/Kustomize; GitOps (Argo CD/Flux); progressive delivery (canary/blue green/Argo Rollouts/Flagger).
Serving & APIs: Deploy and tune FastAPI services and Triton/ONNX/TensorRT inference; traffic shaping, runtime config, autoscaling signals.
Event-driven orchestration: Build robust consumers/producers on RabbitMQ/ActiveMQ/Kafka with back-pressure, dead-lettering, idempotency, and retry patterns.
Observability & AIOps: Define SLIs/SLOs and error budgets; metrics/logs/traces (Prometheus/Grafana/Loki/Tempo/ELK); intelligent alerting & noise reduction; basic model/data drift hooks.
Security in SDLC: Supply-chain security (image signing/provenance, SBOM scans), SAST/DAST/IaC scanning, policy-as-code (OPA/Gatekeeper), secrets hygiene in pipelines/workloads.
Data/Model platform integration: S3/MinIO for artifacts; integrate model registry (MLflow or similar) into CD; immutable, traceable releases.
Resilience & performance: Capacity planning (incl. GPU), autoscaling (HPA/VPA/KEDA), caching/queue tuning; chaos/game-days; write runbooks and own incident response for platform services.
Developer experience: Golden paths, starter repos, internal Helm charts, docs & enablement to make shipping boring and fast.
FinOps mindset: Cost dashboards, right-sizing, bin-packing, GPU utilization policies, spot vs on-demand strategy.
Skills and Qualifications (Required)3+ years in DevOps/SRE/MLOps with strong Docker & Kubernetes fundamentals.
Production CI/CD expertise; canary/blue-green; artifact & version management.
IaC (Terraform) and GitOps workflows (Argo CD/Flux).
Observability: Prometheus/Grafana; logs/traces with Loki/Tempo/ELK.
Production message queues (RabbitMQ/ActiveMQ/Kafka) with back-pressure & retries.
Cloud experience (AWS/GCP/Azure), EKS preferred; object storage (S3/MinIO); model registries (MLflow or similar).
Security in SDLC and compliance guardrails for PHI-like data (least-privilege IAM, secrets, auditability).
Incident response experience; writing SLIs/SLOs, runbooks, and operating to error budgets.
Scripting for platform tasks (Python/Bash).
PreferredTriton Inference Server, ONNX/TensorRT optimizations; GPU scheduling on K8s (NVIDIA device plugin, MIG, node pools).
Argo Rollouts/Flagger, Karpenter, KEDA; caching layers (Redis/NVCache patterns).
Policy-as-code (OPA/Gatekeeper), image signing (cosign), SBOM tools (syft/grype).
Network savvy for app delivery (ingress, service meshes, egress policies).
EducationBE/B.Tech (MS/M.Tech a bonus) or equivalent experience.
Location & Work SetupOn-site - Gurugram
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
