Login Sign Up
🔔 FCM Loaded

Senior Observability Engineer

Cubical Operations LLP

10 - 12 years

Hyderabad

Posted: 04/04/2026

Getting a referral is 5x more effective than applying directly

Job Description

Job Title: SRE Observability Engineer / Senior Observability Engineer

Experience: 5 10 Years

Location: Hyderabad - Madhapur

Employment Type: Full-Time

Notice Period: Immediate Joiners Preferred


Job Summary

We are looking for a highly skilled and forward-thinking SRE Observability Engineer to lead the design and implementation of observability solutions across complex, distributed systems. The ideal candidate should have strong expertise in monitoring, logging, and tracing tools, along with a vision for implementing AI-driven observability to enhance system reliability and performance.

This role requires close collaboration with cross-functional teams including Development, DevOps, Infrastructure, and SRE to improve system visibility, incident response, and overall platform stability.


Mandatory Skills

  • Strong hands-on experience in Observability Engineering
  • Expertise in Grafana for visualization and monitoring
  • Advanced experience in Prometheus & Loki, including writing complex queries
  • Proven experience in implementing AI-driven observability / anomaly detection systems


Key Responsibilities

  • Lead the design and implementation of observability solutions (monitoring, logging, tracing) across cloud and on-prem environments
  • Build and manage monitoring tools such as Prometheus, Grafana, Datadog, New Relic, and AppDynamics
  • Implement distributed tracing frameworks like OpenTelemetry, Jaeger, or Zipkin
  • Optimize log management using tools like Elasticsearch, Splunk, Loki, and Fluentd
  • Develop advanced alerting and anomaly detection mechanisms to reduce MTTR
  • Collaborate with DevOps and SRE teams to integrate observability into CI/CD pipelines and microservices architecture
  • Automate observability workflows using scripting languages (Python, Bash, Golang)
  • Drive scalability and performance improvements across large-scale distributed systems
  • Lead incident troubleshooting, root cause analysis, and system diagnostics
  • Stay updated with the latest trends in observability, SRE, and AI-driven monitoring


Required Qualifications

  • 510 years of experience in SRE, Observability, or DevOps roles
  • Strong expertise in Prometheus, Grafana, and Loki (must-have)
  • Experience with cloud platforms: Azure / AWS / GCP
  • Hands-on experience with Kubernetes and containerized environments
  • Strong scripting skills (Python, Bash, or Golang)
  • Experience with Infrastructure as Code tools (Terraform, Ansible)
  • Deep understanding of distributed systems, system performance, and reliability engineering
  • Experience in incident management and production support environments
  • Excellent communication and stakeholder management skills


Preferred Qualifications

  • Experience with AI-driven observability tools and anomaly detection techniques
  • Familiarity with microservices, serverless, and event-driven architectures
  • Experience with on-call support and incident response workflows
  • Relevant certifications in cloud platforms or SRE practices


Key Competencies

  • Strong analytical and problem-solving skills
  • Ownership and accountability
  • Leadership and mentoring ability
  • Ability to work in a fast-paced Agile environment.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.