Site Reliability Engineer
PwC India
2 - 5 years
Bengaluru
Posted: 04/04/2026
Getting a referral is 5x more effective than applying directly
Job Description
Opportunity
We are looking for SREs who want to define what reliability means for the next generation of industrial software. Defining SLIs/SLOs, building observability platforms, and establishing incident management processes.
Responsibilities
- Define and implement SLI/SLO frameworks for complex engineering systems across manufacturing and industrial clients
- Design and deploy observability platforms using Prometheus, Grafana, and Datadog
- Establish incident management processes and lead blameless post-mortems
- Implement chaos engineering practices to proactively identify system weaknesses
- Drive toil elimination through automation and platform improvements
- Build reliability engineering capabilities within the practice and client organisations
Essential Skills
- SLI/SLO definition and implementation at enterprise scale
- Observability: Prometheus, Grafana, Datadog, New Relic
- Incident management and post-mortem facilitation
- Chaos engineering: Gremlin, Chaos Monkey, Litmus
- Python testing for reliability validation and automated runbooks
- Automation and scripting: Python, Go, Bash
- Cloud platforms: AWS, Azure, GCP
Experience
510 years in SRE or Production Engineering roles with experience in enterprise or industrial environments
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
