Site Reliability Engineer
NeuAlto
2 - 4 years
Bengaluru
Posted: 21/02/2026
Job Description
Experience: 02 Years
Location: Bangalore
Job Type: Permanent, Full time, WFO
Role OverviewAs a Site Reliability Engineer (SRE) Observability, you will support the design, implementation, and maintenance of monitoring and observability platforms for customer facing and AI driven applications.
This is an entry level to early career role where you will work under senior SREs to build dashboards, configure monitoring tools, and help improve service reliability and visibility across systems.
You will collaborate with engineering and operations teams to understand application behavior and contribute to building clear, actionable dashboards and monitoring solutions.
Key Responsibilities- Assist in configuring and maintaining observability tools such as Grafana, Prometheus, Loki, and Jaeger
- Support the implementation of Golden Signals (Latency, Traffic, Errors, Saturation)
- Build and maintain basic Grafana dashboards for engineering and operations teams
- Help collect and validate metrics, logs, and traces from applications
- Assist in troubleshooting production issues using logs and monitoring tools
- Participate in monitoring performance indicators such as latency, throughput, and error rates
- Support implementation of alerting rules and basic SLO monitoring
- Document dashboard structures, monitoring configurations, and operational runbooks
- Work with senior engineers to improve dashboard usability and visualization clarity
- Learn and apply SRE best practices in reliability and availability
- 02 years of experience in DevOps, SRE, Monitoring, or Backend Engineering roles
Basic understanding of:
- Linux systems
- Cloud platforms (AWS / Azure / GCP)
- Microservices architecture
- Familiarity with monitoring tools such as Grafana or Prometheus
Basic knowledge of:
- Metrics, logs, and distributed tracing concepts
- HTTP status codes and API monitoring
- Understanding of reliability concepts such as uptime, availability, and incident response
- Good problem solving and debugging skills
- Strong willingness to learn observability engineering and production systems
Hands on exposure to:
- Prometheus (metrics collection)
- Loki (log aggregation)
- Jaeger (distributed tracing)
- Basic understanding of containers (Docker) and Kubernetes
- Familiarity with CI/CD pipelines
- Knowledge of alerting systems and monitoring thresholds
- Exposure to AI / ML or high traffic applications
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
