🔔 FCM Loaded

Inference Optimization Engineer(LLM and Runtime)

Sustainability Economics.ai

2 - 5 years

Bengaluru

Posted: 12/02/2026

Getting a referral is 5x more effective than applying directly

Job Description

Location:Bengaluru, Karnataka

About the Company:

Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades ofexpertisein finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services,expertise, and capability expansion.

Role Summary:

We areseekinga highly skilled and innovativeInference Optimization (LLM and Runtime)to design, develop, andoptimizecutting-edgeAI systems that power intelligent, scalable, and agent-driven workflows. This role blends the frontier of generative AI research with robust engineering, requiringexpertisein machine learning, deep learning, and large language models (LLMs) and latest trends going on in the industry. The ideal candidate will collaborate with cross-functional teams to build production-ready AI solutions that address real-world business challenges while keeping our platforms at the forefront of AI innovation.

Key Tasks and Accountability:

  • Optimizationand customizationof large-scale generative models (LLMs) for efficientinference and serving.
  • Apply and evaluate advancedmodel optimization techniquessuch as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance.
  • Implementcustom fine-tuning pipelinesusing parameter-efficient methods (LoRA,QLoRA, adapters etc.) to achieve task-specific goals while minimizingcomputeoverhead.
  • Optimizeruntime performanceof inference stacks using frameworks likevLLM,TensorRT-LLM,DeepSpeed-Inference, and Hugging Face Accelerate.
  • Design and implementscalable model-serving architectureson GPU clusters and cloud infrastructure (AWS, GCP, or Azure).
  • Work closely with platform and infrastructure teams to reducelatency, memory footprint, and cost-per-tokenduring production inference.
  • Evaluatehardwaresoftware co-optimization strategiesacross GPUs (NVIDIA A100/H100), TPUs, or custom accelerators.
  • Monitor and profile performance using tools such asNsight,PyTorchProfiler, and Triton Metricsto drive continuous improvement.

Key Requirements:

Education & Experience

  • Ph.D. inComputer Scienceor a related field, with a specialization inDeep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI/ML).
  • 23 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work.

Skills

  • Strong analytical and mathematical reasoning ability with a focus on measurable performance gains.
  • Collaborative mindset, withabilityto work across research, engineering, and product teams.
  • Pragmatic problem-solver who valuesefficiency, reproducibility, and maintainable codeover theoretical exploration.
  • Curiosity-driven attitude keeps up withemerging model compression and inference technologies.

WhatYoullDo

  • Take ownership ofend-to-end optimization lifecycle from profiling bottlenecks to delivering production-optimized LLMs.
  • Developcustom inference pipelinescapable of high throughput and low latency under real-world traffic.
  • Build andmaintaininternal libraries, wrappers, and benchmarking suitesfor continuous performance evaluation.

What you will bring

  • Hands-on experience inbuilding,optimizingmachine learning orAgentic Systemsat scale.
  • Abuilders mindset bias toward action, comfort with experimentation, and enthusiasm for solving complex, open-ended challenges.
  • Startup DNA bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset.

Why Join Us

  • Shape afirst-of-its-kind AI + clean energy platform.
  • Work with a small, mission-driven team obsessed with impact.
  • An aggressive growth path.
  • A chance to leave your mark at the intersection ofAIandsustainability.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.