Login Sign Up

Machine Learning Engineer

InfiVR

2 - 5 years

Bengaluru

Posted: 12/05/2026

Getting a referral is 5x more effective than applying directly

Job Description

Job Title: ML Engineer Edge AI (Vision, Voice & On-Device LLMs)

Company: InfiVR

Location: Bangalore, India (On-Site)

Employment Type: Full-time

Experience Level: 25 Years


About InfiVR

InfiVR delivers AI, computer vision, and immersive digital solutions for industrial enterprises across Oil & Gas, Defense, Healthcare, and Aerospace. We build intelligent applications that run in real-world operational environments on edge devices, in the field, often without connectivity. Our clients include Fortune 500 companies and leading global organizations.


About the Role

We are looking for an ML Engineer who can take AI models from research to production on mobile and edge hardware. This is not a training-in-the-cloud role you will be selecting, optimizing, and deploying models that run entirely on-device under tight compute, memory, and power constraints. You will work across computer vision, speech-to-text, and small language models, shipping them on Qualcomm chipsets with NPU acceleration for industrial field applications.


Responsibilities

Evaluate and benchmark pre-trained models for on-device deployment across object detection, OCR, speech-to-text, and conversational AI. Quantize and optimize models (INT8, INT4, W4A16) using AIMET or equivalent tools, targeting ONNX, TFLite, and QNN formats. Profile and optimize inference latency, memory usage, and thermal performance on actual target hardware. Integrate models into Android applications using ONNX Runtime (QNN Execution Provider), whisper.cpp, llama.cpp, and NDK. Build and maintain on-device RAG pipelines using local vector stores and small language models. Collaborate with Android developers on camera, audio, and sensor integration for AI-powered field applications.


Requirements

2+ years deploying ML models on mobile or edge devices not just training, but shipping on real hardware. Hands-on experience with model quantization and optimization for constrained environments. Strong working knowledge of at least two of: object detection (YOLO family, EfficientDet), speech-to-text (Whisper, wav2vec), or small language models (Phi, Gemma, Llama). Proficiency in Python and PyTorch for model preparation. Comfortable with C/C++ and Android NDK for on-device integration. Understanding of ONNX model format and runtime ecosystem.


Good to Have

Experience with Qualcomm AI stack (AI Hub, QNN SDK, Hexagon NPU). Familiarity with PaddleOCR or similar mobile OCR frameworks. Prior work on industrial, field-deployed, or offline-first applications. Exposure to on-device embedding models and vector search.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.