Login Sign Up

AI Kernel Engineer

Snaphunt

2 - 5 years

Pune City

Posted: 11/05/2026

Getting a referral is 5x more effective than applying directly

Job Description

You will play a key role in building and optimising high-performance AI kernels for a next-generation compute platform. This role focuses on enabling efficient execution of AI and LLM workloads by developing, profiling, and optimising kernels across varying hardware configurations.

You will be responsible for:

  • Developing AI/LLM kernels and operators for efficient inference on a specialised compute platform
  • Optimising kernel performance across different hardware configurations and workloads
  • Profiling and analysing performance across compute, memory, and parallelism to identify bottlenecks
  • Optimising low-level C/C++ code to maximise hardware utilisation
  • Collaborating across the AI inference stack, including runtime, compiler, and system layers
  • Contributing to improvements in toolchain, compiler, and runtime components
  • Supporting internal teams and external stakeholders with technical insights and documentation


Ideal Candidate

  • You have a Bachelors or Masters degree in Computer Science, Electrical Engineering, or a related field
  • You have 5+ years of experience in AI kernel development and performance optimisation
  • You have experience in profiling models and kernel inference performance
  • You have hands-on experience with at least one of the following: CUDA, DSP, NEON, or Triton
  • You have strong proficiency in C/C++ and Python; exposure to assembly is a plus
  • You have strong problem-solving, debugging, and communication skills
  • You are comfortable working close to hardware and across system layers


The Offer

  • Competitive compensation with meaningful equity
  • High-impact role in a deeply technical, low-bureaucracy environment
  • Opportunity to work on cutting-edge AI systems and long-term career growth


About the employer

Our client is a Silicon Valleybased deep-tech company building a new compute architecture for real-time AI at the edge. Founded by engineers from leading research backgrounds, the focus is on solving the gaps in current neural processing approaches through tight integration of hardware and software.

The platform is built to run both neural network inference and conventional compute workloads efficiently across a wide range of edge devices. Unlike typical accelerators that only handle parts of an ML graph, this architecture supports end-to-end execution, including both neural network graph code and standard C++ DSP and control code, enabling greater flexibility and performance in real-world deployments.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.