🔔 FCM Loaded

Lead, Site Reliability Engineer

Toyota Connected

8 - 12 years

Chennai

Posted: 25/08/2025

Job Description

About Toyota Connected: 

If you want to change the way the world works, transform the automotive industry and positively impact others on a global scale, then Toyota Connected is the right place for you! Within our collaborative, fast-paced environment we focus on continual improvement and work in a highly iterative way to deliver exceptional value in the form of connected products and services that wow and delight our customers and the world around us. Come help us re-imagine what mobility can be today and for years to come! 

 

About the Team: 

Toyota Connected India is looking for Lead Site Reliability engineer. This team is focused on creating infotainment solutions on embedded and cloud platforms. The team members are required to be creative in solving problems, excited to work in new technology areas and be ready to wear multiple hats to get things done. This is a highly energized, fast-paced, innovative and collaborative startup environment; therefore, it is essential that not only the skillset, but also the personality matches such an environment.  

 


Responsibilities:

·       Assist in the design and implementation of reliable and scalable systems using Kubernetes, Docker, and Istio.

·       Proactively identify performance improvements in areas such as responsiveness, availability, and scalability.

·       Monitor system performance and respond to incidents as they arise, utilizing Datadog for observability.

·       Help develop automation scripts for deployment and monitoring.

·       Leverage GitOps to ensure that software can reliably and smoothly be shipped to production.

·       Collaborate with development teams to identify and resolve reliability issues.

·       Conduct load testing to verify that systems can handle expected loads for new products and updates to existing products.

·       Implement A/B deployments, canary deployments, and traffic mirroring strategies to ensure critical updates go smoothly and can be rolled back easily if necessary.

·       Utilize Helm charts for application deployment and management.

·       Understand AWS systems, including AWS Load Balancers, EKS and routing, to support systems handling millions of requests per hour.

·       Ensure that solutions are cost-effective while providing a high-quality customer experience and maintaining very high availability.

·       Participate in on-call rotations and support production systems, collaborating with SREs in other parts of the world.

·       Contribute to documentation and knowledge sharing within the team.

·       Assist in the implementation of best practices for system reliability.

You are a successful candidate if you have

·       8+ years of experience in Site Reliability Engineering, DevOps, or a related field.

·       Expertise with AWS.

·       Expertise with Kubernetes, Docker, and Istio.

·       Knowledge of monitoring and alerting tools, particularly Datadog, AppDynamics, ELK, Grafana, or Prometheus.

·       Implement and tune Horizontal Pod Autoscalers (HPAs) to optimize resource utilization.

·       Understanding of Argo CD for GitOps practices.

·       Familiarity with A/B, Canary, Blue/Green deployments, and traffic mirroring techniques.

About Company

Toyota Connected is a global technology arm of Toyota focusing on smart mobility solutions. It uses AI, cloud computing, and big data to create personalized driving experiences and connected vehicle ecosystems. The company enhances Toyota's vision for a mobility-driven future.

Services you might be interested in

One-Shot Campaign

Reach out to ideal employees in one shot!

The intelligent campaign for reaching out to the ideal audience to whom you can ask for help (guidance or referral).