🔔 FCM Loaded

Site Reliability Engineer

Enterprise Minds, Inc

2 - 5 years

Bengaluru

Posted: 15/12/2025

Getting a referral is 5x more effective than applying directly

Job Description

Senior Site Reliability Engineer (GCP | Terraform | Ansible | SRE | On-Call)


We are looking for a high-impact Site Reliability Engineer (SRE) who will play a key role in ensuring the reliability, availability, and scalability of our production systems on Google Cloud Platform (GCP) .

If you thrive in fast-paced environments, excel in incident management, and love building automated, scalable infrastructurethis role is for you.


Responsibilities

Production Reliability & On-Call Excellence

  • Act as a primary responder in a 247 rotational on-call schedule .
  • Rapidly identify, mitigate, and resolve high-severity production incidents impacting GCP services.
  • Conduct detailed Root Cause Analysis (RCA) and implement long-term corrective actions.

Infrastructure-as-Code (IaC)

  • Design, build, and maintain large-scale, multi-environment infrastructure using Terraform .
  • Develop reusable modules, follow best practices, and maintain version-controlled infrastructure deployments.

Configuration Management

  • Build and optimize Ansible playbooks and roles for configuration consistency, patching, and environment provisioning.

Automation & Tooling

  • Develop automation using Python, Go, or Bash to eliminate operational toil and accelerate engineering productivity.
  • Drive automation-first culture across the SRE team.

Monitoring, Observability & Tooling

  • Enhance monitoring, logging, and alerting using tools like Prometheus, Grafana, Stackdriver , or similar.
  • Improve observability for proactive detection of service health degradation.

Containers & Orchestration

  • Manage and troubleshoot Kubernetes (GKE) clusters for deployment, scaling, and reliability of containerized applications.

SRE Best Practices

  • Define and measure SLIs/SLOs , engineer reliability, and reduce toil through automation.
  • Collaborate closely with DevOps, Cloud, and Engineering teams for continuous improvement.


Requirements

Must Have

  • 3+ years of hands-on experience on GCP , including GKE, GCE, VPC networking, IAM, load balancers, security, and networking fundamentals.
  • Advanced expertise in Terraform for production-grade infrastructure deployments.
  • Strong Ansible experience for configuration management.
  • Proven experience in on-call rotations , incident response, and handling critical production issues.
  • Proficiency in Python, Go, or Bash for automation.
  • Strong understanding of SRE principles : SLIs/SLOs, error budgets, incident management, RCA.
  • Experience with Kubernetes , containerization, and troubleshooting distributed systems.


Nice to Have

  • Exposure to service mesh (Istio/Linkerd).
  • Experience with CI/CD pipelines (Jenkins, GitLab CI, Cloud Build).
  • Networking and security certifications (GCP Associate Cloud Engineer / Professional Cloud DevOps Engineer).


What We Offer

  • Opportunity to work on high-scale, mission-critical systems .
  • A culture of ownership, innovation, and automation.
  • Competitive compensation + on-call benefits.
  • Growth opportunities in SRE, Cloud, and Platform Engineering tracks.


How to Apply

Share your updated resume at:

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.