🔔 FCM Loaded

Site Reliability Engineer

GREYTIP SOFTWARE PRIVATE LIMITED

5 - 7 years

Bengaluru

Posted: 10/12/2025

Getting a referral is 5x more effective than applying directly

Job Description

About the Role

We are looking for a skilled Site Reliability Engineer II to join our SRE team. The ideal candidate will have hands-on experience in production monitoring, alert handling, and L1 production support . You will play a key role in ensuring the reliability, availability, and performance of our production systems.

Key Responsibilities

  • Monitor production systems using enterprise monitoring tools and dashboards.
  • Respond to alerts promptly and take appropriate first-level actions.
  • Provide L1 production support , including initial triage, log analysis, and escalation to relevant teams as needed.
  • Participate in incident management, including documentation, communication, and coordination during production incidents.
  • Perform basic troubleshooting for application, infrastructure, and platform issues.
  • Ensure adherence to SLAs, SLOs, and operational best practices.
  • Contribute to runbooks, knowledge base articles, and incident postmortems.
  • Collaborate with engineering and DevOps teams for incident resolution and improvements.
  • Participate in on-call rotations as required.

Required Skills & Qualifications

  • 25 years of experience in SRE, Production Support, DevOps, or similar roles.
  • Hands-on experience with production monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic, Splunk, CloudWatch, etc.).
  • Strong understanding of alerting systems , incident lifecycle, and on-call processes.
  • Basic troubleshooting knowledge in Linux/Unix , networking fundamentals, and cloud environments.
  • Familiarity with logging tools (e.g., ELK, Splunk, Cloud Logging).
  • Ability to communicate clearly during incidents and coordinate with cross-functional teams.
  • Strong analytical, problem-solving, and time-management skills.

Good to Have

  • Experience with cloud platforms (AWS/Azure/GCP).
  • Basic scripting skills (Python, Shell, Bash).
  • Exposure to CI/CD pipelines and DevOps practices.
  • Understanding of SLOs, SLIs, and reliability engineering principles.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.