Release Automation Manager
Saarthee
7 - 9 years
Bengaluru
Posted: 03/05/2026
Job Description
About Saarthee:
Saarthee is a Global Strategy, Analytics, Technology and AI consulting company, where our passion for helping others fuels our approach and our products and solutions. Our diverse and global team work with one objective in mind: Our Customers Success. At Saarthee, we are passionate about guiding organizations to wards insights fueled success. Thats why we call ourselves Saartheeinspired by the Sanskrit word Saarthi, which means charioteer, trusted guide, or companion. Cofounded in 2015 by Mrinal Prasad and Shikha Miglani, Saarthee already encompasses all the components of Data Analytics consulting. Saarthee is based out of Philadelphia, USA with office in UK and India.
Position: Senior Service Reliability Engineer
Location: Bangalore
Work Mode: Hybrid
Min-Max Experience: 7-9 years
Position Summary:
We are seeking a Senior Site Reliability Engineer (SSRE) to join our technology team. We are looking for a Senior Site Reliability Engineer (SRE) with deep expertise in observability, cloud-native infrastructure, and large-scale distributed systems. This role is highly hands-on and focuses on designing, building, and operating reliable, observable, and scalable platforms running on Kubernetes, with a strong preference for Google Cloud Platform (GCP) and AWS.
Your Role Responsibilities and Duties:
Reliability & Operations
- Design, implement, and maintain highly available and resilient systems in Kubernetes-based environments
- Define and enforce SLOs, SLIs, and error budgets
- Lead incident response, RCA, and postmortems
- Drive reliability improvements through automation
Observability (Core Focus)
- Architect and operate observability platforms for metrics, logging, tracing, and alerting
- Work with Prometheus, Alertmanager, OpenTelemetry, Grafana, Loki / ELK / OpenSearch
- Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)
- Establish actionable alerting standards
Cloud & Platform Engineering
- Build and manage infrastructure on GCP (preferred) or AWS
- Operate Kubernetes clusters (GKE preferred)
- Deploy services using Helm
- Manage containerized workloads using Docker
Automation & Tooling
- Strong Python skills with emphasis on reliability, automation, and observability tooling
- Develop automation and tooling using Python
- Create internal reliability and monitoring tools
- Integrate CI/CD pipelines with observability and reliability checks
Collaboration & Leadership
- Mentor junior engineers
- Influence architecture decisions
- - Collaborate across engineering teams
Required Skills and Qualifications:
Mandatory:
- Bachelors degree in Engineering/Technology or related discipline.
- 79 years of experience in Software Development and/or Linux Systems Administration.
- Strong interpersonal, written, and verbal communication skills.
- Expertise as a Linux Production Systems Engineer managing large-scale Web Services infrastructure.
- Development experience in Python (preferred) and one of Shell Scripting, Bash, Go, Java, C++, Rust.
Mandatory Skills:
Python, Site Reliability Engineer, Elk
Skill to Evaluate:
Python, Site Reliability Engineer, Elk, AWS, GCP, Kubernetes, Docker, Ansible, packer, Jenkins, Splunk, Cribl, Terraform, Vectors, Prometheus, Linux, helm, Datadog.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
