Site Reliability Engineer
Landmark Group
2 - 5 years
Bengaluru
Posted: 10/12/2025
Job Description
What Youll Do:
Ensure reliability and high availability of Java and microservices-based applications through proactive monitoring and automation.
Define and track SLIs/SLOs to maintain service performance and stability.
Troubleshoot and resolve production issues , performing detailed root cause analysis to prevent recurrence.
Build and enhance observability using Prometheus, Grafana, Loki, or New Relic .
Automate operational tasks deployments, scaling, rollbacks, diagnostics, and alerting .
Collaborate with engineering and DevOps teams to integrate reliability practices into the CI/CD pipeline.
Drive AIOps initiatives for intelligent alert correlation and predictive incident management.
Mentor teams on best practices in monitoring, performance optimization, and operational efficiency.
What Were Looking For:
36 years of experience in Site Reliability Engineering, Application Operations, or DevOps .
Strong hands-on experience with Java, Spring Boot , and microservices architecture .
Proficiency in monitoring tools (Prometheus, Grafana, Loki, New Relic, or similar).
Experience with Kubernetes , containers , and cloud platforms (AWS, Azure, or GCP).
Strong scripting skills in Bash, Python, or Go for automation and diagnostics.
Familiar with incident management, RCA, and performance debugging .
Exposure to AIOps tools or AI/LLM-based observability platforms is a plus.
Excellent problem-solving and communication skills.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
