Monitoring & Observability Specialist
YASH Technologies
2 - 5 years
Bengaluru
Posted: 15/03/2026
Job Description
Job Description:
We are seeking a Full-stack Infrastructure Observability Specialist to join the Infra and Operations Team. Thisrole will focus on building and enabling
end-to-end observability strategies
across applications,infrastructure, and networks. A key responsibility is to design, implement, and optimize monitoring frameworksthat leverage
AIOps, automation, and cloud-native observability tools
to deliver proactive insights,predictive analytics, and zero-downtime operations.
You will administer and integrate observability platforms, develop intelligent alerting and dashboarding, andcollaborate with cross-functional teams to ensure resilient, scalable, and secure infrastructure.
Key Responsibilities
- Observability Strategy : Define and execute a full-stack observability roadmap aligned with business andIT goals, embedding AIOps and SRE principles.
- Monitoring Frameworks : Design and implement comprehensive monitoring solutions for applications,infrastructure, and networks to ensure continuous performance and availability.
- Data Analysis & Insights : Use AIOps-driven analytics to identify trends, predict failures, and automatecorrective actions.
- Tool Ownership & Integration : Manage and optimize observability tools (Splunk, Datadog, Prometheus,Grafana, ThousandEyes, ServiceNow AIOps, etc.), integrating them across hybrid environments.
- Automation & Intelligence : Develop automated workflows for alerting, incident detection, and rootcause analysis using scripting and AI-driven approaches.
- Dashboarding & Reporting : Build intelligent dashboards and provide actionable insights to stakeholderson system health, incidents, and performance improvements.
- Incident & Problem Management : Partner with ITSM teams to enhance detection, triage, and resolutionworkflows with AI-assisted root cause analysis.
- Continuous Improvement : Stay updated with emerging observability and AIOps technologies,integrating them to enhance monitoring capabilities.
Qualifications
- 5+ years in IT infrastructure, monitoring, or observability roles.
- Strong experience in AIOps platforms and applying AI/ML for monitoring, anomaly detection, andpredictive analytics.
- Expertise with observability tools: Datadog, OpManager, Splunk, Dynatrace, AppDynamics, New Relic,Prometheus, Grafana, Nagios, etc.
- Familiarity with cloud-native monitoring across AWS, Azure, GCP, and on-premise data centers.
- Proficiency in scripting/automation (Python, Shell, PowerShell, Ansible).
- Experience with DevOps and cloud-native environments (Kubernetes, Docker, Terraform, CI/CDpipelines).
- Knowledge of database monitoring (SQL and NoSQL).
- Strong analytical and problem-solving skills for proactive detection and resolution.
- Excellent communication and collaboration skills to work across IT Ops, DevOps, Security, andApplication teams.
- Experience presenting monitoring insights and observability metrics to executives and stakeholders.
- Solid foundation in networking and Linux administration.
- Experience with Atlassian tooling (Jira, Confluence) preferred.
- Certifications (ITIL, DevOps, AWS, Azure, GCP, Agile, PMP) are a plus.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
