Login Sign Up
🔔 FCM Loaded

Event Monitoring & Observability Manager

Teamware Solutions

5 - 10 years

Hyderabad

Posted: 03/04/2026

Getting a referral is 5x more effective than applying directly

Job Description

Greetings from Teamware Solutions a division of Quantum Leap Consulting Pvt Ltd


Job Description:

Role: Event Monitoring & Observability Manager - Monitoring strategy, alerting standards

Experience: 7-10 Yrs

Location : Hyderabad

Work Mode: Hybrid opportunity ,Not a Remote

Shift : (Flexibility Required) / 2:00 PM - 11 : 00 PM

Notice Period: Immediate to 20 Days preferred

Fastest way to get connected LinkedIn



Job Summary :

Oversees observability strategy and execution across database, infrastructure, and application ecosystems. Defines standards, architects monitoring solutions, leads automation initiatives, and governs alerting practices. Ensures operational reliability, stakeholder alignment, and continuous improvement of observability capabilities.


Key Responsibilities :

Required Skills :

Leading enterprise observability strategies across hybrid environments (cloud, on-prem, database, application)

Architecting telemetry pipelines for logs, metrics, traces, and availability monitoring

Designing end-to-end monitoring architectures, database HA/DR oversight, and automation frameworks

Overseeing advanced performance engineering, workload profiling, and capacity forecasting

Directing automation standards using Terraform, Ansible, and versioned script repositories

Managing complex incident response, RCA governance, and high-severity issue remediation

Driving cross-platform optimization across SQL, Oracle, MySQL, PostgreSQL, NoSQL systems

Ensuring compliance with security, audit, and operational monitoring standards

Managing multi-team coordination, stakeholder communication, and vendor relationships

Leading the development of operational runbooks, documentation, and training programs


Preferred Skills :

Develops multi-year observability roadmaps and platform strategies;

Develops monitoring standards aligned with security, compliance, and cost objectives;

Develops enterprise-wide dashboards and analytics frameworks that provide actionable insights;

Develops automation programs to streamline provisioning, patching, alerting, and reporting;

Develops advanced troubleshooting methodologies, escalation flows, and operational governance;

Develops HA/DR drills, continuity assessments, and resilience testing schedules;

Develops processes for onboarding, documentation, and knowledge sharing across teams;

Develops evaluation criteria for emerging tools, conducting pilots and recommending adoption.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.