🔔 FCM Loaded

Vice President - Site Reliability Engineering & AI Ops

Talent Toppers

5 - 10 years

Gurugram

Posted: 20/02/2026

Getting a referral is 5x more effective than applying directly

Job Description

Experience - 15+ Years



The Vice President AIOps and System Reliability Engineering is a visionary technology leader responsible for driving the strategy, automation, and transformation of the enterprise IT operations landscape.



Strategic & Autonomous Operations Leadership

o Define and execute a long-term strategy for Agentic AI, autonomous operations, and AI driven service management.

o Build and operationalize an enterprise-wide framework for Autonomous IT Operations (AIOps), ensuring seamless integration with infrastructure, cloud, and SRE functions.

o Lead the implementation of AI agents, decisioning engines, and self-healing automation across service operations.


Operational Excellence & Automation Transformation

o Achieve 100% elimination of L1 support through predictive automation, intelligent routing, and autonomous resolution workflows.

o Deliver 50% reduction in L2 support workload through AI based diagnostics, automated remediations, and knowledge orchestration.

o Oversee implementation of AI driven monitoring, anomaly detection, auto triaging, and automated incident remediation.

o Optimize IT operations through AIOps, observability platforms, and closed loop automation.


Infrastructure Engineering & SRE Leadership

o Oversee all DC/DR operations including servers, storage, databases, networking, and hybrid cloud infrastructure.


Innovation & Technology Modernization

o Identify, evaluate, and implement emerging technologies including Agentic AI, GenAI copilots, predictive operations, and advanced automation frameworks.

o Lead modernization of CI/CD, IaC, and DevSecOps with embedded AI and smart orchestration.

o Build a center of excellence for autonomous operations and AI first service engineering.


Incident, Problem & Change Management Automation

o Deploy automated playbooks, AI guided root cause analysis, and recommendation engines.

o Implement self-service and conversational AI capabilities across ITSM platforms.

o Ensure proactive detection (MTTD < 5 minutes) and rapid recovery (MTTR < 1 hour) through automation.


People & Vendor Leadership

o Lead and mentor engineering, SRE, and automation teams with a strong culture of innovation and accountability.



Required Skills & Experience:

Bachelors degree in engineering, Computer Science, or related field (B.E./BTech preferred).

15+ years of progressive experience in infrastructure engineering, SRE, DevOps, and IT operations automation.

Strong hands-on experience with AIOps platforms, Agentic AI models, and autonomous operations frameworks.

Proven background in large scale IT modernization, observability, and reliability engineering.

Expertise in cloud operations (AWS/Azure/OCI), Kubernetes, container orchestration, and IaC.

Deep understanding of AI/ML, automation platforms, scripting (Python/Shell), and integration pipelines.

Experience with ITSM platforms, incident automation, and workflow orchestration (ServiceNow preferred).

Strong leadership capabilities with experience in driving major automation transformations.

Strong hands-on experience with AIOps platforms, Agentic AI models, and autonomous operations frameworks.

Proven expertise in managing large-scale, distributed systems with a focus on scalability, reliability, and security.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.