🔔 FCM Loaded

Site Reliability Engineer

NeuAlto

2 - 4 years

Bengaluru

Posted: 21/02/2026

Getting a referral is 5x more effective than applying directly

Job Description

Site Reliability Engineer

Experience: 02 Years

Location: Bangalore

Job Type: Permanent, Full time, WFO

Role Overview

As a Site Reliability Engineer (SRE) Observability, you will support the design, implementation, and maintenance of monitoring and observability platforms for customer facing and AI driven applications.

This is an entry level to early career role where you will work under senior SREs to build dashboards, configure monitoring tools, and help improve service reliability and visibility across systems.

You will collaborate with engineering and operations teams to understand application behavior and contribute to building clear, actionable dashboards and monitoring solutions.

Key Responsibilities
  • Assist in configuring and maintaining observability tools such as Grafana, Prometheus, Loki, and Jaeger
  • Support the implementation of Golden Signals (Latency, Traffic, Errors, Saturation)
  • Build and maintain basic Grafana dashboards for engineering and operations teams
  • Help collect and validate metrics, logs, and traces from applications
  • Assist in troubleshooting production issues using logs and monitoring tools
  • Participate in monitoring performance indicators such as latency, throughput, and error rates
  • Support implementation of alerting rules and basic SLO monitoring
  • Document dashboard structures, monitoring configurations, and operational runbooks
  • Work with senior engineers to improve dashboard usability and visualization clarity
  • Learn and apply SRE best practices in reliability and availability
Required Qualifications
  • 02 years of experience in DevOps, SRE, Monitoring, or Backend Engineering roles

Basic understanding of:

  • Linux systems
  • Cloud platforms (AWS / Azure / GCP)
  • Microservices architecture
  • Familiarity with monitoring tools such as Grafana or Prometheus

Basic knowledge of:

  • Metrics, logs, and distributed tracing concepts
  • HTTP status codes and API monitoring
  • Understanding of reliability concepts such as uptime, availability, and incident response
  • Good problem solving and debugging skills
  • Strong willingness to learn observability engineering and production systems
Technical Skills (Good to Have)

Hands on exposure to:

  • Prometheus (metrics collection)
  • Loki (log aggregation)
  • Jaeger (distributed tracing)
  • Basic understanding of containers (Docker) and Kubernetes
  • Familiarity with CI/CD pipelines
  • Knowledge of alerting systems and monitoring thresholds
  • Exposure to AI / ML or high traffic applications

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.