Site Reliability Engineer
Signzy
2 - 6 years
Bengaluru
Posted: 26/02/2026
Job Description
About Signzy
Signzy is an AI-powered RPA platform for financial services. No matter how complex your workflow or operational complexity, Signzy can completely automate your back-operations decision-making process into a real-time API. This is possible due to a combination of Nebula - Our no-code AI model builder and our Fintech API Marketplace of over 200+ APIs. Today we work with over 90+ FIs globally including the 4 largest banks in India and a Top 3 acquiring Bank in the US. Globally we have a strong partnership with MasterCard and offices in New York and Dubai to serve our customers in the 2 geographies. Our Product team of 120+ people is building a global AI product out of Bangalore.
Job Location: Bangalore and Dubai
Working at Signzy
At Signzy we breathe software and exploit the latest technologies to create the most amazing products. We comprise a tech-savvy team and are backed by investors who are enthusiastic about creating solutions using technology.This is an invitation to be a part of the future!
Responsibilities
- Design, deploy, and operate reliable and scalable systems across cloud and Kubernetes environments.
- Automate infrastructure provisioning, deployments, and operational workflows.
- Build and maintain tools for deployment, monitoring, and system operations.
- Monitor system health and performance, and proactively identify areas for improvement.
- Troubleshoot and resolve issues across development, test, and production environments.
- Participate in incident response, root cause analysis, and reliability improvements.
- Collaborate with engineering teams to improve system operability and deployment safety.
- Support and operate large-scale systems, including data-intensive or AI-driven workloads.
Requirements
- 2 - 6 years of experience managing and operating production infrastructure and services in cloud environments such as AWS, Azure, or GCP.
- Strong hands-on experience with Linux systems in production environments.
- Experience working with containerized workloads and Kubernetes in real-world scenarios.
- Working knowledge of Infrastructure as Code tools such as Terraform, Terragrunt, or Crossplane.
- Experience designing and maintaining CI/CD pipelines using tools such as GitHub Actions, GitLab CI, Jenkins, Azure DevOps, or similar.
- Familiarity with GitOps principles and tools such as Argo CD or Flux.
- Solid understanding of cloud networking concepts, load balancing, and service connectivity.
- Experience with monitoring, logging, and alerting systems such as Prometheus, Grafana, ELK/EFK, Datadog, or equivalent.
- Proficiency in at least one scripting or programming language (e.g., Bash, Python).
- Experience working with relational databases; exposure to NoSQL or data platforms is a plus.
- Experience participating in on-call rotations, responding to production incidents, and performing root cause analysis.
- Understanding of SLIs, SLOs, and error budgets, and how they are used to guide reliability and operational decisions.
- Strong problem-solving skills and the ability to debug complex production issues.
- Good verbal and written communication skills, especially during incidents and technical discussions.
Nice to Have
- Experience operating systems at scale or in high-availability environments.
- Exposure to on-prem or hybrid infrastructure.
- Experience supporting data platforms, analytics, or AI/ML workloads.
What We Value
- A strong sense of ownership and responsibility for production systems.
- A focus on automation, reliability, and operational simplicity.
- The ability to balance speed, stability, and long-term maintainability.
- Curiosity and willingness to continuously improve systems and processes.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
