Specialist, HPC Systems Research & Development
KLA
2 - 5 years
Chennai
Posted: 11/22/2024
Job Description
Company Overview
Group/Division
Job Description
KLA’s AI Advanced Computing Labs is looking for an extraordinary HPC System R&D Engineer to join its team to develop system-level HPC technologies that would form the foundation of next-generation clusters used in KLA tools that leverage AI to push the boundaries of process control for conductor manufacturing. The technologies would be developed and demonstrated on on-prem clusters that serve as testbeds for next-generation KLA tools.
Your Day-to-day Roles
- Expose limitations in existing solutions, based on clusters of CPUs & GPUs, to deploy AI-based solutions on on-prem & cloud infrastructures at scale.
- Develop distributed frameworks and system-level solutions that enable scaling out image processing & AI loads from single GPU to multi-node clusters with multiple GPUs.
- Install, benchmark, and evaluate pre-release hardware for early-stage evaluation and prototyping by identifying (or developing) relevant workloads.
Minimum Qualifications
- Masters / PhD in Computer Science or related fields; bachelors degree holders with relevant experience and extraordinary track-record will also be considered.
- Deep understanding of operating systems, computer networks, and high performance applications
- Good mental model of the architecture of a modern distributed systems that is comprised of CPUs, GPUs, and accelerators.
- Experience with deployments of deep-learning frameworks based on TensorFlow, and PyTorch on large-scale on-prem or cloud infrastructures.
- Strong background in modern and advanced C++ concepts
- Strong Scripting Skills in Bash, Python, or similar.
- Good communication.
Things to Make us go Wow!
- Experience in heterogenous programming languages like CUDA, Triton, etc.
- Experience with model development on DL frameworks such as TensorFlow, and PyTorch
- Experience with building open-source operating systems and software stack on pre-release hardware.
- Solid understanding of container infrastructure such as Docker or singularity, and Kubernetes.
- Active participation in C++ standards bodies or similar
We offer a competitive, family friendly total rewards package. We design our programs to reflect our commitment to an inclusive environment, while ensuring we provide benefits that meet the diverse needs of our employees.
KLA is proud to be an equal opportunity employer
Be aware of potentially fraudulent job postings or suspicious recruiting activity by persons that are currently posing as KLA employees. KLA never asks for any financial compensation to be considered for an interview, to become an employee, or for equipment. Further, KLA does not work with any recruiters or third parties who charge such fees either directly or on behalf of KLA. Please ensure that you have searched KLA’s Careers website for legitimate job postings. KLA follows a recruiting process that involves multiple interviews in person or on video conferencing with our hiring managers. If you are concerned that a communication, an interview, an offer of employment, or that an employee is not legitimate, please send an email to talent.acquisition@kla.com to confirm the person you are communicating with is an employee. We take your privacy very seriously and confidentially handle your information.
About Company
KLA is a global leader in semiconductor process control and yield management solutions. The company designs technologies that enable chip manufacturers to detect defects and improve efficiency, playing a key role in the semiconductor and electronics industries.
Services you might be interested in
One-Shot Campaign
Reach out to ideal employees in one shot!
The intelligent campaign for reaching out to the ideal audience to whom you can ask for help (guidance or referral).