Site Reliability Engineer - Database (7 to 10 Years)
PhonePe
7 - 10 years
Bengaluru
Posted: 13/07/2025
Job Description
About PhonePe Group:
PhonePe is India’s leading digital payments company with 50 crore (500 Million) registered users and 3.7 crore (37 Million) merchants covering over 99% of the postal codes across India. On the back of its leadership in digital payments, PhonePe has expanded into financial services (Insurance, Mutual Funds, Stock Broking, and Lending) as well as adjacent tech-enabled businesses such as Pincode for hyperlocal shopping and Indus App Store which is India's first localized App Store. The PhonePe Group is a portfolio of businesses aligned with the company's vision to offer every Indian an equal opportunity to accelerate their progress by unlocking the flow of money and access to services.
Culture
At PhonePe, we take extra care to make sure you give your best at work, Everyday! And creating the right environment for you is just one of the things we do. We empower people and trust them to do the right thing. Here, you own your work from start to finish, right from day one. Being enthusiastic about tech is a big part of being at PhonePe. If you like building technology that impacts millions, ideating with some of the best minds in the country and executing on your dreams with purpose and speed, join us!
Site Reliability Engineer - Database
Experience: 7 tp 10 Years
We are seeking a highly skilled and experienced SRE Engineer (7 to 10 years of experience) with deep expertise in MySQL database administration and a solid foundation in Linux systems engineering. You will play a critical role in ensuring the resilience, scalability, and performance of our distributed, high-volume database infrastructure spanning tens of terabytes of data across multiple data centers. In this role, you will be expected to design, build, and lead initiatives to improve reliability and efficiency across the database stack, mentor SRE/DBA team members, and drive strategic improvements to infrastructure.
Responsibilities
- Database Architecture & Management: Lead the design, provisioning, and lifecycle management of large-scale MySQL/Galera multi-master clusters across multiple geographic locations.
- Reliability Engineering: Develop and implement database reliability strategies, including automated failure recovery and disaster recovery solutions.
- Troubleshooting & Support: Investigate and resolve database-related issues, including performance problems, connectivity issues, and data corruption.
- Performance, optimization & Security: Own and continuously improve performance tuning, including query optimization, indexing, and resource management, security hardening, and high availability of database systems.
- Operational Excellence:
- Standardize and automate database operational tasks such as upgrades, backups, schema changes, and replication management.
- Drive capacity planning, monitoring, and incident response across infrastructure.
- Incident Management: Proactively identify, diagnose, and resolve complex production issues in collaboration with the engineering team.
- On-Call & Tooling:
- Participate in and enhance on-call rotations, implementing tools to reduce alert fatigue and human error.
- Develop and maintain observability tooling for database systems.
- Leadership & Mentorship: Mentor and guide junior and mid-level SREs and DBAs, fostering knowledge sharing and skill development within the team.
Skills and Qualifications
Core Expertise:
- Expertise in Linux systems administration, scripting (Bash/Python), file systems, disk management, and debugging system-level performanceissues.
- 7–8+ years of hands-on experience in MySQL database administration in large-scale, high-availability environments.
- Deep understanding of MySQL internals, InnoDB storage engine, replication mechanisms (async, semi-sync, Galera), and tuning parameters.
- Proven experience managing 100+ production clusters and databases larger than 1TB in size.
Preferred Experience:
- Hands-on experience with Galera clusters is a strong plus.
- Familiarity with Infrastructure-as-Code tools like Ansible, Terraform, or similar.
- Experience with observability tools such as Prometheus, Grafana, or Percona Monitoring & Management.
- Exposure to other NOSQL (e.g., Aerospike) will be a plus.
- Experience working in on-premise environments is highly desirable.
Leadership & Communication:
- Proven ability to lead cross-functional initiatives, including database migrations, major version upgrades, and scaling efforts.
- Excellent communication skills with a demonstrated track record of mentoring and technical leadership.
PhonePe Full Time Employee Benefits (Not applicable for Intern or Contract Roles)
- Insurance Benefits - Medical Insurance, Critical Illness Insurance, Accidental Insurance, Life Insurance
- Wellness Program - Employee Assistance Program, Onsite Medical Center, Emergency Support System
- Parental Support - Maternity Benefit, Paternity Benefit Program, Adoption Assistance Program, Day-care Support Program
- Mobility Benefits - Relocation benefits, Transfer Support Policy, Travel Policy
- Retirement Benefits - Employee PF Contribution, Flexible PF Contribution, Gratuity, NPS, Leave Encashment
- Other Benefits - Higher Education Assistance, Car Lease, Salary Advance Policy
Working at PhonePe is a rewarding experience! Great people, a work environment that thrives on creativity, the opportunity to take on roles beyond a defined job description are just some of the reasons you should work with us. Read more about PhonePe on our blog.
About Company
PhonePe is one of India's leading digital payments platforms, offering services like money transfer, utility payments, recharges, and investments. It empowers users with a seamless and secure UPI-based transaction ecosystem.
Services you might be interested in
One-Shot Campaign
Reach out to ideal employees in one shot!
The intelligent campaign for reaching out to the ideal audience to whom you can ask for help (guidance or referral).