Site Reliability Engineer
Enterprise Minds, Inc
2 - 5 years
Bengaluru
Posted: 15/12/2025
Job Description
Senior Site Reliability Engineer (GCP | Terraform | Ansible | SRE | On-Call)
We are looking for a high-impact Site Reliability Engineer (SRE) who will play a key role in ensuring the reliability, availability, and scalability of our production systems on Google Cloud Platform (GCP) .
If you thrive in fast-paced environments, excel in incident management, and love building automated, scalable infrastructurethis role is for you.
Responsibilities
Production Reliability & On-Call Excellence
- Act as a primary responder in a 247 rotational on-call schedule .
- Rapidly identify, mitigate, and resolve high-severity production incidents impacting GCP services.
- Conduct detailed Root Cause Analysis (RCA) and implement long-term corrective actions.
Infrastructure-as-Code (IaC)
- Design, build, and maintain large-scale, multi-environment infrastructure using Terraform .
- Develop reusable modules, follow best practices, and maintain version-controlled infrastructure deployments.
Configuration Management
- Build and optimize Ansible playbooks and roles for configuration consistency, patching, and environment provisioning.
Automation & Tooling
- Develop automation using Python, Go, or Bash to eliminate operational toil and accelerate engineering productivity.
- Drive automation-first culture across the SRE team.
Monitoring, Observability & Tooling
- Enhance monitoring, logging, and alerting using tools like Prometheus, Grafana, Stackdriver , or similar.
- Improve observability for proactive detection of service health degradation.
Containers & Orchestration
- Manage and troubleshoot Kubernetes (GKE) clusters for deployment, scaling, and reliability of containerized applications.
SRE Best Practices
- Define and measure SLIs/SLOs , engineer reliability, and reduce toil through automation.
- Collaborate closely with DevOps, Cloud, and Engineering teams for continuous improvement.
Requirements
Must Have
- 3+ years of hands-on experience on GCP , including GKE, GCE, VPC networking, IAM, load balancers, security, and networking fundamentals.
- Advanced expertise in Terraform for production-grade infrastructure deployments.
- Strong Ansible experience for configuration management.
- Proven experience in on-call rotations , incident response, and handling critical production issues.
- Proficiency in Python, Go, or Bash for automation.
- Strong understanding of SRE principles : SLIs/SLOs, error budgets, incident management, RCA.
- Experience with Kubernetes , containerization, and troubleshooting distributed systems.
Nice to Have
- Exposure to service mesh (Istio/Linkerd).
- Experience with CI/CD pipelines (Jenkins, GitLab CI, Cloud Build).
- Networking and security certifications (GCP Associate Cloud Engineer / Professional Cloud DevOps Engineer).
What We Offer
- Opportunity to work on high-scale, mission-critical systems .
- A culture of ownership, innovation, and automation.
- Competitive compensation + on-call benefits.
- Growth opportunities in SRE, Cloud, and Platform Engineering tracks.
How to Apply
Share your updated resume at:
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
