Site Reliability Engineer
GS Lab & GAVS
3 - 5 years
Chennai
Posted: 14/07/2025
Job Description
Site Reliability Engineer (SRE) - Azure Tech Stack
Experience: 3-5 years
Location: [Chennai Work from office ]
About the Role:
We are seeking a highly motivated and experienced Site Reliability Engineer (SRE) to join our
growing team. As an SRE, you will be instrumental in ensuring the reliability, scalability, and
performance of our critical applications and infrastructure built on Microsoft Azure. You will
leverage your expertise in Azure services, automation, and incident management to drive
operational excellence and continuous improvement.
Key Responsibilities:
● System Reliability & Performance:
○ Design, implement, and maintain highly available, scalable, and resilient systems
on Azure.
○ Proactively monitor system health, performance, and availability using Azure
Monitor, Application Insights, Log Analytics, and other monitoring tools (e.g.,
Grafana, Prometheus, Splunk).
○ Define, track, and report on Service Level Indicators (SLIs) and Service Level
Objectives (SLOs) to ensure adherence to service availability and performance
targets.
○ Conduct root cause analysis (RCA) for incidents and implement preventive
measures to avoid recurrence.
○ Participate in on-call rotation to provide 24/7 support for production systems,
diagnosing and resolving critical issues promptly.
● Automation & Infrastructure as Code (IaC):
○ Develop and maintain automation scripts and tools using PowerShell, Python,
Bash, or Go to automate repetitive tasks, deployments, and infrastructure
provisioning.
○ Implement and manage infrastructure using IaC principles with tools like
Terraform or Azure Bicep.
○ Contribute to the design and implementation of robust CI/CD pipelines using
Azure DevOps, GitHub Actions, or similar tools to ensure efficient and reliable
application deployments.
● Azure Ecosystem Management:
○ Hands-on experience deploying, configuring, and managing a wide range of
Azure services, including:
■ Compute: Azure Virtual Machines, Azure Kubernetes Service (AKS),
Azure Functions, Azure App Service
■ Networking: Azure Virtual Networks, Load Balancers, Azure Front Door,
DNS
■ Storage: Azure Storage Accounts (Blob, File, Queue, Table), Azure SQL
Database, Azure Cosmos DB
■ Monitoring & Logging: Azure Monitor, Application Insights, Log
Analytics, Kusto Query Language (KQL)
■ Security: Azure Active Directory (AAD), Azure Security Center, Azure
Policy, Key Vault, Network Security Groups (NSGs)
○ Optimize Azure resource utilization for cost efficiency and performance.
● Collaboration & Best Practices:
○ Collaborate closely with development teams (DevOps culture) to integrate
reliability practices into the software development lifecycle ("shift-left").
○ Promote and implement SRE best practices, including error budgets, blameless
post-mortems, and continuous improvement.
○ Contribute to documentation of system architecture, operational procedures, and
troubleshooting guides.
○ Stay up-to-date with emerging Azure technologies and SRE trends, proposing
and adopting relevant innovations.
Required Skills & Qualifications:
● Bachelor's degree in Computer Science, Information Technology, or a related field, or
equivalent practical experience.
● 3-5 years of hands-on experience in a Site Reliability Engineering, DevOps, or similar
role with a strong focus on Microsoft Azure.
● Proficiency in at least one scripting or programming language (e.g., Python, PowerShell,
Go, Bash).
● Solid understanding of Infrastructure as Code (IaC) principles and experience with tools
like Terraform or Azure Bicep.
● Demonstrated experience with CI/CD pipelines (Azure DevOps preferred).
● Strong experience with Azure monitoring and logging solutions (Azure Monitor,
Application Insights, Log Analytics, KQL).
● Experience with containerization and orchestration technologies, particularly Azure
Kubernetes Service (AKS).
● Good understanding of networking concepts (TCP/IP, DNS, Load Balancing).
● Familiarity with database systems (SQL and NoSQL).
● Strong problem-solving, analytical, and troubleshooting skills.
● Excellent communication and collaboration skills, with the ability to work effectively in a
team environment.
● Ability to work independently and manage multiple priorities in a fast-paced environment.
Preferred Skills & Certifications:
● Microsoft Certified: Azure Administrator Associate (AZ-104)
● Microsoft Certified: Azure DevOps Engineer Expert (AZ-400)
● Certified Kubernetes Administrator (CKA)
● Experience with other monitoring tools like Grafana, Prometheus, Splunk, Datadog.
● Familiarity with security best practices in cloud environments.
● Experience with Git and version control systems.
What We Offer:
● Opportunity to work on cutting-edge Azure technologies and build highly reliable
systems.
● Opportunity to work with Enterprise Healthcare organization PAN INDIA application
● Collaborative and supportive team environment.
● Continuous learning and development opportunities.
● Competitive salary and benefits package.
About Company
GS Lab and GAVS have merged to offer end-to-end digital transformation and IT services. Their combined expertise spans AI/ML, cloud modernization, infrastructure management, and cybersecurity. They serve clients in healthcare, BFSI, and enterprise IT.
Services you might be interested in
One-Shot Campaign
Reach out to ideal employees in one shot!
The intelligent campaign for reaching out to the ideal audience to whom you can ask for help (guidance or referral).