Site Reliability Engineer
GREYTIP SOFTWARE PRIVATE LIMITED
5 - 7 years
Bengaluru
Posted: 10/12/2025
Getting a referral is 5x more effective than applying directly
Job Description
About the Role
We are looking for a skilled Site Reliability Engineer II to join our SRE team. The ideal candidate will have hands-on experience in production monitoring, alert handling, and L1 production support . You will play a key role in ensuring the reliability, availability, and performance of our production systems.
Key Responsibilities
- Monitor production systems using enterprise monitoring tools and dashboards.
- Respond to alerts promptly and take appropriate first-level actions.
- Provide L1 production support , including initial triage, log analysis, and escalation to relevant teams as needed.
- Participate in incident management, including documentation, communication, and coordination during production incidents.
- Perform basic troubleshooting for application, infrastructure, and platform issues.
- Ensure adherence to SLAs, SLOs, and operational best practices.
- Contribute to runbooks, knowledge base articles, and incident postmortems.
- Collaborate with engineering and DevOps teams for incident resolution and improvements.
- Participate in on-call rotations as required.
Required Skills & Qualifications
- 25 years of experience in SRE, Production Support, DevOps, or similar roles.
- Hands-on experience with production monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic, Splunk, CloudWatch, etc.).
- Strong understanding of alerting systems , incident lifecycle, and on-call processes.
- Basic troubleshooting knowledge in Linux/Unix , networking fundamentals, and cloud environments.
- Familiarity with logging tools (e.g., ELK, Splunk, Cloud Logging).
- Ability to communicate clearly during incidents and coordinate with cross-functional teams.
- Strong analytical, problem-solving, and time-management skills.
Good to Have
- Experience with cloud platforms (AWS/Azure/GCP).
- Basic scripting skills (Python, Shell, Bash).
- Exposure to CI/CD pipelines and DevOps practices.
- Understanding of SLOs, SLIs, and reliability engineering principles.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
