About Saarthee:

Saarthee is a Global Strategy, Analytics, Technology and AI consulting company, where our passion for helping others fuels our approach and our products and solutions. Our diverse and global team work with one objective in mind: Our Customers Success. At Saarthee, we are passionate about guiding organizations to wards insights fueled success. Thats why we call ourselves Saartheeinspired by the Sanskrit word Saarthi, which means charioteer, trusted guide, or companion. Cofounded in 2015 by Mrinal Prasad and Shikha Miglani, Saarthee already encompasses all the components of Data Analytics consulting. Saarthee is based out of Philadelphia, USA with office in UK and India.

Position: Senior Service Reliability Engineer

Location: Bangalore

Work Mode: Hybrid

Min-Max Experience: 7-9 years

Position Summary:

We are seeking a Senior Site Reliability Engineer (SSRE) to join our technology team. We are looking for a Senior Site Reliability Engineer (SRE) with deep expertise in observability, cloud-native infrastructure, and large-scale distributed systems. This role is highly hands-on and focuses on designing, building, and operating reliable, observable, and scalable platforms running on Kubernetes, with a strong preference for Google Cloud Platform (GCP) and AWS.

Your Role Responsibilities and Duties:

Reliability & Operations

- Design, implement, and maintain highly available and resilient systems in Kubernetes-based environments

- Define and enforce SLOs, SLIs, and error budgets

- Lead incident response, RCA, and postmortems

- Drive reliability improvements through automation

Observability (Core Focus)

- Architect and operate observability platforms for metrics, logging, tracing, and alerting

- Work with Prometheus, Alertmanager, OpenTelemetry, Grafana, Loki / ELK / OpenSearch

- Implement cloud-native monitoring (GCP Cloud Monitoring & Logging preferred)

- Establish actionable alerting standards

Cloud & Platform Engineering

- Build and manage infrastructure on GCP (preferred) or AWS

- Operate Kubernetes clusters (GKE preferred)

- Deploy services using Helm

- Manage containerized workloads using Docker

Automation & Tooling

- Strong Python skills with emphasis on reliability, automation, and observability tooling

- Develop automation and tooling using Python

- Create internal reliability and monitoring tools

- Integrate CI/CD pipelines with observability and reliability checks

Collaboration & Leadership

- Mentor junior engineers

- Influence architecture decisions

- Collaborate across engineering teams

Required Skills and Qualifications:

Mandatory:

Bachelors degree in Engineering/Technology or related discipline.
79 years of experience in Software Development and/or Linux Systems Administration.
Strong interpersonal, written, and verbal communication skills.
Expertise as a Linux Production Systems Engineer managing large-scale Web Services infrastructure.
Development experience in Python (preferred) and one of Shell Scripting, Bash, Go, Java, C++, Rust.

Mandatory Skills:

Python, Site Reliability Engineer, Elk

Skill to Evaluate:

Python, Site Reliability Engineer, Elk, AWS, GCP, Kubernetes, Docker, Ansible, packer, Jenkins, Splunk, Cribl, Terraform, Vectors, Prometheus, Linux, helm, Datadog.

Release Automation Manager

Saarthee

Job Description

Services you might be interested in

Improve Your Resume Today