Site Reliability Engineer

NetApp, Inc.

8 - 10 years

Bangalore

Posted: 11/07/2023

Job Description

Job Summary:


As a Cloud Infrastructure/Sr. Site Reliability Engineer, you operate seamlessly between development and operations. You’ll engage in and improve the lifecycle of cloud services - from design to deployment, operation, and refinement. You’ll maintain services by measuring and monitoring availability, latency, and overall system health. You’ll play an important role in scaling systems sustainably through automation and evolving them by pushing for changes to improve reliability and velocity. You will administer cloud-based environments that support our SaaS / IaaS offerings that are implemented on a microservices, container-based architecture (Kubernetes).  

To be successful in this role, you must be a motivated self-starter and self-learner, possess strong problem-solving skills; and be someone who embraces challenges.

Essential Functions:

•   Work with other Cloud Infrastructure engineers and developers to ensure maximum performance, reliability, and automation of our deployments and infrastructure.

•   Work with, consult and influence developers on new features and software architecture to ensure scalability.

•   Develop software, both as components of our solution and outside of the solution, for deployment automation, packaging, and monitoring visibility.

•   Identify tasks and areas where automation can be applied to achieve time efficiencies and risk reduction.

•   Debug and troubleshoot service bottlenecks throughout the whole software stack.

•   Measure and monitor availability, latency, and overall system health.

•   Provide advanced escalation support (tier 2 and 3) to NetApp ‘s Cloud Data Services solutions.

•   Ensure the availability and reliability of distributed systems. Works as a bridge between development, operations, and other teams to build and maintain resilient systems.

•   Adopt and propose automation of repetitive tasks to reduce/eliminate toil.

•   Keep a proactive approach to spotting problems, areas for improvement, and performance bottlenecks.

•   You will have a direct influence on the decisions and outcomes related to solution implementation.


Job Requirements:


Ability to embrace new technologies and work in a fast-paced, global environment.

•   Systematic problem-solving approach coupled with a sense of ownership and drive.

•   Excellent written and verbal communication skills.

•   Ability to manage competing priorities and multiple deadlines.

Good interpersonal communication and customer service skills are needed to work successfully with stakeholders in high-stress and/or ambiguous situations.

•   Strong understanding of systems design and networking, regarding performance and scale.

•   Familiarity with the lifecycle of cloud services - from design to deployment, operation, and refinement.


Responsibility and Interaction:


•   The types of tasks this individual is responsible for are often unique, non-routine, and unstructured, requiring creative solutions. 

•   This individual will apply attained experiences and knowledge in solving routine to moderately complex problems.

•   This role includes on-call work and travel from time to time.

Interaction

•   This individual interacts primarily with their direct manager, site reliability team, development team, and hyperscaler partners on assigned projects and deployments. This may involve interaction across functions, geo-locations, and from staff to Vice President level.

•   Limited management direction is provided on new projects or assignments; general guidance is provided on new assignments.

•   The ideal candidate will be a proactive contributor and subject matter expert on team projects.

•   To be successful, this individual must demonstrate favorable results through coaching and influencing others.


Education:


A minimum of 8 - 10 years of experience is required. 4 to 6 years of experience is preferred.

•   A Bachelor of Science Degree in Computer Science, a master’s degree; or equivalent experience is required.  

•   Demonstrated Linux/Unix, CORE OS experience.

•   Scripting and infrastructure automation using, for example, Ansible, Python, Go, Perl, or Ruby.

•   Deep working Knowledge of Containers, Kubernetes, and Serverless computing implementation. 

•   Understanding of SDLC lifecycle and DevOps development methodologies 

•   Demonstrated ability to have completed multiple, moderately complex technical tasks.

•   Familiarity with distributed systems design patterns using tools such as Kubernetes.

•   Strong familiarity with Google Cloud 

About Company

NetApp, Inc. is an American data storage and data management services company headquartered in San Jose, California. It has ranked in the Fortune 500 from 2012 to 2021. Founded in 1992 with an initial public offering in 1995, NetApp offers cloud data services for managing applications and data both online and physically. NetApp was founded in 1992 by David Hitz, James Lau, and Michael Malcolm as Network Appliance, Inc. At the time, its major competitor was Auspex Systems. In 1994, NetApp received venture capital funding from Sequoia Capital. It had its initial public offering in 1995. NetApp thrived in the internet bubble years of the mid-1990s to 2001, during which the company grew to $1 billion in annual revenue. After the bubble burst, NetApp's revenues quickly declined to $800 million in its fiscal year 2002. Since then, the company's revenue has steadily climbed.

Services you might be interested in

One-Shot Campaign

Reach out to ideal employees in one shot!

The intelligent campaign for reaching out to the ideal audience to whom you can ask for help (guidance or referral).