Site Reliability Engineer

Experience: 02 Years

Location: Bangalore

Job Type: Permanent, Full time, WFO

Role Overview

As a Site Reliability Engineer (SRE) Observability, you will support the design, implementation, and maintenance of monitoring and observability platforms for customer facing and AI driven applications.

This is an entry level to early career role where you will work under senior SREs to build dashboards, configure monitoring tools, and help improve service reliability and visibility across systems.

You will collaborate with engineering and operations teams to understand application behavior and contribute to building clear, actionable dashboards and monitoring solutions.

Key Responsibilities

Assist in configuring and maintaining observability tools such as Grafana, Prometheus, Loki, and Jaeger
Support the implementation of Golden Signals (Latency, Traffic, Errors, Saturation)
Build and maintain basic Grafana dashboards for engineering and operations teams
Help collect and validate metrics, logs, and traces from applications
Assist in troubleshooting production issues using logs and monitoring tools
Participate in monitoring performance indicators such as latency, throughput, and error rates
Support implementation of alerting rules and basic SLO monitoring
Document dashboard structures, monitoring configurations, and operational runbooks
Work with senior engineers to improve dashboard usability and visualization clarity
Learn and apply SRE best practices in reliability and availability

Required Qualifications

02 years of experience in DevOps, SRE, Monitoring, or Backend Engineering roles

Basic understanding of:

Linux systems
Cloud platforms (AWS / Azure / GCP)
Microservices architecture
Familiarity with monitoring tools such as Grafana or Prometheus

Basic knowledge of:

Metrics, logs, and distributed tracing concepts
HTTP status codes and API monitoring
Understanding of reliability concepts such as uptime, availability, and incident response
Good problem solving and debugging skills
Strong willingness to learn observability engineering and production systems

Technical Skills (Good to Have)

Hands on exposure to:

Prometheus (metrics collection)
Loki (log aggregation)
Jaeger (distributed tracing)
Basic understanding of containers (Docker) and Kubernetes
Familiarity with CI/CD pipelines
Knowledge of alerting systems and monitoring thresholds
Exposure to AI / ML or high traffic applications

Site Reliability Engineer

NeuAlto

Job Description

Services you might be interested in

Improve Your Resume Today