Job Title: SRE Observability Engineer / Senior Observability Engineer

Experience: 5 10 Years

Location: Hyderabad - Madhapur

Employment Type: Full-Time

Notice Period: Immediate Joiners Preferred

Job Summary

We are looking for a highly skilled and forward-thinking SRE Observability Engineer to lead the design and implementation of observability solutions across complex, distributed systems. The ideal candidate should have strong expertise in monitoring, logging, and tracing tools, along with a vision for implementing AI-driven observability to enhance system reliability and performance.

This role requires close collaboration with cross-functional teams including Development, DevOps, Infrastructure, and SRE to improve system visibility, incident response, and overall platform stability.

Mandatory Skills

Strong hands-on experience in Observability Engineering
Expertise in Grafana for visualization and monitoring
Advanced experience in Prometheus & Loki, including writing complex queries
Proven experience in implementing AI-driven observability / anomaly detection systems

Key Responsibilities

Lead the design and implementation of observability solutions (monitoring, logging, tracing) across cloud and on-prem environments
Build and manage monitoring tools such as Prometheus, Grafana, Datadog, New Relic, and AppDynamics
Implement distributed tracing frameworks like OpenTelemetry, Jaeger, or Zipkin
Optimize log management using tools like Elasticsearch, Splunk, Loki, and Fluentd
Develop advanced alerting and anomaly detection mechanisms to reduce MTTR
Collaborate with DevOps and SRE teams to integrate observability into CI/CD pipelines and microservices architecture
Automate observability workflows using scripting languages (Python, Bash, Golang)
Drive scalability and performance improvements across large-scale distributed systems
Lead incident troubleshooting, root cause analysis, and system diagnostics
Stay updated with the latest trends in observability, SRE, and AI-driven monitoring

Required Qualifications

510 years of experience in SRE, Observability, or DevOps roles
Strong expertise in Prometheus, Grafana, and Loki (must-have)
Experience with cloud platforms: Azure / AWS / GCP
Hands-on experience with Kubernetes and containerized environments
Strong scripting skills (Python, Bash, or Golang)
Experience with Infrastructure as Code tools (Terraform, Ansible)
Deep understanding of distributed systems, system performance, and reliability engineering
Experience in incident management and production support environments
Excellent communication and stakeholder management skills

Preferred Qualifications

Experience with AI-driven observability tools and anomaly detection techniques
Familiarity with microservices, serverless, and event-driven architectures
Experience with on-call support and incident response workflows
Relevant certifications in cloud platforms or SRE practices

Key Competencies

Strong analytical and problem-solving skills
Ownership and accountability
Leadership and mentoring ability
Ability to work in a fast-paced Agile environment.

Senior Observability Engineer

Cubical Operations LLP

Job Description

Services you might be interested in

Improve Your Resume Today