Site Reliability Engineer
Concentrix
2 - 5 years
Bengaluru
Posted: 12/02/2026
Getting a referral is 5x more effective than applying directly
Job Description
- 5+ years in observability, monitoring, or reliability engineering roles.
- Hands-on experience with common observability tools such as Prometheus, Grafana, Splunk, Coralogix, and external monitoring tools (e.g., Catchpoint, ThousandEyes).
- Strong scripting skills in Python, plus Bash or PowerShell for automation.
- Experience with Terraform and Ansible for infrastructure automation.
- Solid understanding of SLIs, SLOs, error budgets, and reliability engineering principles.
- Familiarity with Linux environments and distributed systems.
- Design and implement a Universal Dashboard in Grafana for leadership and engineering visibility.
- Ensure a consistent look and feel across all observability views.
- Define and implement SLIs, SLOs, and error budgets for critical services.
- Establish alerting thresholds and escalation workflows aligned with reliability goals.
- Integrate anomaly detection and AI-assisted insights into the observability platform.
- Contribute to self-healing workflows and automated remediation strategies.
- Partner with engineering teams to instrument services with metrics, logs, and traces.
- Provide documentation and best practices for observability adoption across teams.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
