Event Monitoring & Observability Manager
Teamware Solutions
5 - 10 years
Hyderabad
Posted: 03/04/2026
Job Description
Greetings from Teamware Solutions a division of Quantum Leap Consulting Pvt Ltd
Job Description:
Role: Event Monitoring & Observability Manager - Monitoring strategy, alerting standards
Experience: 7-10 Yrs
Location : Hyderabad
Work Mode: Hybrid opportunity ,Not a Remote
Shift : (Flexibility Required) / 2:00 PM - 11 : 00 PM
Notice Period: Immediate to 20 Days preferred
Fastest way to get connected LinkedIn
Job Summary :
Oversees observability strategy and execution across database, infrastructure, and application ecosystems. Defines standards, architects monitoring solutions, leads automation initiatives, and governs alerting practices. Ensures operational reliability, stakeholder alignment, and continuous improvement of observability capabilities.
Key Responsibilities :
Required Skills :
Leading enterprise observability strategies across hybrid environments (cloud, on-prem, database, application)
Architecting telemetry pipelines for logs, metrics, traces, and availability monitoring
Designing end-to-end monitoring architectures, database HA/DR oversight, and automation frameworks
Overseeing advanced performance engineering, workload profiling, and capacity forecasting
Directing automation standards using Terraform, Ansible, and versioned script repositories
Managing complex incident response, RCA governance, and high-severity issue remediation
Driving cross-platform optimization across SQL, Oracle, MySQL, PostgreSQL, NoSQL systems
Ensuring compliance with security, audit, and operational monitoring standards
Managing multi-team coordination, stakeholder communication, and vendor relationships
Leading the development of operational runbooks, documentation, and training programs
Preferred Skills :
Develops multi-year observability roadmaps and platform strategies;
Develops monitoring standards aligned with security, compliance, and cost objectives;
Develops enterprise-wide dashboards and analytics frameworks that provide actionable insights;
Develops automation programs to streamline provisioning, patching, alerting, and reporting;
Develops advanced troubleshooting methodologies, escalation flows, and operational governance;
Develops HA/DR drills, continuity assessments, and resilience testing schedules;
Develops processes for onboarding, documentation, and knowledge sharing across teams;
Develops evaluation criteria for emerging tools, conducting pilots and recommending adoption.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
