Vice President - Site Reliability Engineering & AI Ops
Talent Toppers
5 - 10 years
Gurugram
Posted: 20/02/2026
Job Description
Experience - 15+ Years
The Vice President AIOps and System Reliability Engineering is a visionary technology leader responsible for driving the strategy, automation, and transformation of the enterprise IT operations landscape.
Strategic & Autonomous Operations Leadership
o Define and execute a long-term strategy for Agentic AI, autonomous operations, and AI driven service management.
o Build and operationalize an enterprise-wide framework for Autonomous IT Operations (AIOps), ensuring seamless integration with infrastructure, cloud, and SRE functions.
o Lead the implementation of AI agents, decisioning engines, and self-healing automation across service operations.
Operational Excellence & Automation Transformation
o Achieve 100% elimination of L1 support through predictive automation, intelligent routing, and autonomous resolution workflows.
o Deliver 50% reduction in L2 support workload through AI based diagnostics, automated remediations, and knowledge orchestration.
o Oversee implementation of AI driven monitoring, anomaly detection, auto triaging, and automated incident remediation.
o Optimize IT operations through AIOps, observability platforms, and closed loop automation.
Infrastructure Engineering & SRE Leadership
o Oversee all DC/DR operations including servers, storage, databases, networking, and hybrid cloud infrastructure.
Innovation & Technology Modernization
o Identify, evaluate, and implement emerging technologies including Agentic AI, GenAI copilots, predictive operations, and advanced automation frameworks.
o Lead modernization of CI/CD, IaC, and DevSecOps with embedded AI and smart orchestration.
o Build a center of excellence for autonomous operations and AI first service engineering.
Incident, Problem & Change Management Automation
o Deploy automated playbooks, AI guided root cause analysis, and recommendation engines.
o Implement self-service and conversational AI capabilities across ITSM platforms.
o Ensure proactive detection (MTTD < 5 minutes) and rapid recovery (MTTR < 1 hour) through automation.
People & Vendor Leadership
o Lead and mentor engineering, SRE, and automation teams with a strong culture of innovation and accountability.
Required Skills & Experience:
Bachelors degree in engineering, Computer Science, or related field (B.E./BTech preferred).
15+ years of progressive experience in infrastructure engineering, SRE, DevOps, and IT operations automation.
Strong hands-on experience with AIOps platforms, Agentic AI models, and autonomous operations frameworks.
Proven background in large scale IT modernization, observability, and reliability engineering.
Expertise in cloud operations (AWS/Azure/OCI), Kubernetes, container orchestration, and IaC.
Deep understanding of AI/ML, automation platforms, scripting (Python/Shell), and integration pipelines.
Experience with ITSM platforms, incident automation, and workflow orchestration (ServiceNow preferred).
Strong leadership capabilities with experience in driving major automation transformations.
Strong hands-on experience with AIOps platforms, Agentic AI models, and autonomous operations frameworks.
Proven expertise in managing large-scale, distributed systems with a focus on scalability, reliability, and security.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
