🔔 FCM Loaded

Site Reliability Engineer

Landmark Group

2 - 5 years

Bengaluru

Posted: 10/12/2025

Getting a referral is 5x more effective than applying directly

Job Description

What Youll Do:


Ensure reliability and high availability of Java and microservices-based applications through proactive monitoring and automation.

Define and track SLIs/SLOs to maintain service performance and stability.

Troubleshoot and resolve production issues , performing detailed root cause analysis to prevent recurrence.

Build and enhance observability using Prometheus, Grafana, Loki, or New Relic .

Automate operational tasks deployments, scaling, rollbacks, diagnostics, and alerting .

Collaborate with engineering and DevOps teams to integrate reliability practices into the CI/CD pipeline.

Drive AIOps initiatives for intelligent alert correlation and predictive incident management.

Mentor teams on best practices in monitoring, performance optimization, and operational efficiency.

What Were Looking For:

36 years of experience in Site Reliability Engineering, Application Operations, or DevOps .

Strong hands-on experience with Java, Spring Boot , and microservices architecture .

Proficiency in monitoring tools (Prometheus, Grafana, Loki, New Relic, or similar).

Experience with Kubernetes , containers , and cloud platforms (AWS, Azure, or GCP).

Strong scripting skills in Bash, Python, or Go for automation and diagnostics.

Familiar with incident management, RCA, and performance debugging .

Exposure to AIOps tools or AI/LLM-based observability platforms is a plus.

Excellent problem-solving and communication skills.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.