Site Reliability Engineer
TRDFIN Support Services Pvt Ltd
2 - 5 years
Noida
Posted: 29/01/2026
Getting a referral is 5x more effective than applying directly
Job Description
Role Summary
We are looking for a highly skilled Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of our cloud-native infrastructure . The ideal candidate will bring strong hands-on experience in AWS, Kubernetes, Docker, CI/CD pipelines, monitoring, and automation using Python , and will work closely with development and operations teams to build resilient, highly available systems.
Key Responsibilities
- Design, deploy, and maintain highly available and scalable systems on AWS
- Manage and operate containerized applications using Docker and Kubernetes (EKS)
- Build, maintain, and optimize CI/CD pipelines using Jenkins
- Automate operational workflows and routine tasks using Python scripting
- Implement and manage monitoring, alerting, and observability using Grafana and Prometheus
- Ensure system reliability, performance, uptime, and scalability
- Participate in incident response , root cause analysis (RCA), and post-incident reviews
- Implement Infrastructure as Code (IaC) and automation best practices
- Collaborate with development teams to improve system architecture and deployment strategies
- Enforce security, compliance, and operational best practices in cloud environments
- Continuously improve system efficiency through automation, tooling, and process optimization
Required Skills & Qualifications
- Strong hands-on experience with AWS services (EC2, S3, IAM, VPC, RDS, EKS, etc.)
- Solid experience with Kubernetes (EKS) and Docker
- Proficiency in Python scripting for automation and monitoring
- Experience designing and managing CI/CD pipelines using Jenkins
- Strong understanding of DevOps principles and CI/CD best practices
- Hands-on experience with Grafana and Prometheus for monitoring and alerting
- Strong knowledge of Linux systems and networking fundamentals
- Experience with Git or other version control systems
- Understanding of microservices architecture
Good to Have
- Experience with Terraform or CloudFormation
- Knowledge of Helm, ArgoCD, or similar deployment tools
- Familiarity with log management tools (ELK / EFK stack)
- Understanding of SRE practices such as SLIs, SLOs, SLAs, and error budgets
- AWS and/or Kubernetes certifications (CKA / CKAD)
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
