Audio ML Engineer - ASR
The Future of Voice
2 - 5 years
Bengaluru
Posted: 21/02/2026
Job Description
Role Summary:
We are hiring a highly skilledAudio ML Engineerto build and optimize our end-to-endAutomatic Speech Recognition (ASR)pipeline. The role includes development ofstreaming ASR,multilingual transcription,Language Identification (LID), andhallucination correctionsystems. You will work with state-of-the-art architectures, delivering scalable, production-grade ASR solutions powering our Voice AI platform.
Key Responsibilities
Develop, train, and optimize ASR architectures (Whisper, Conformer, RNN-T, CTC-based models).
Build streaming ASR pipelines with chunk-based inference and strict low-latency constraints.
Implement language identification (LID), punctuation restoration, and hallucination suppression mechanisms. Integrate ASR with diarization, DSP pipelines, and TTS for synchronized multi-speaker systems.
Optimize inference on GPUs using TensorRT, mixed precision, quantization, and dynamic batching.
Design and deploy scalable ASR APIs using FastAPI, Docker, and Kubernetes.
Develop evaluation frameworks for WER, CER, latency, memory usage, and throughput benchmarking.
Curate multilingual datasets and fine-tune models for noisy and low-resource language environments.
Required Expertise
Strong proficiency inPyTorchwith hands-on ASR model training and deployment experience.
Deep understanding of CTC, RNN-T, attention-based ASR architectures, and speech encoders.
Experience with streaming ASR, VAD gating, chunk alignment, and timestamp stability.
Knowledge of LID systems, punctuation restoration, and post-processing pipelines.
Practical experience in GPU optimization and high-throughput inference systems.
Familiarity with FastAPI, Docker, CI/CD workflows, and Kubernetes orchestration.
Preferred Qualifications
Experience fine-tuning Whisper or building RNN-T based production ASR systems.
Experience building multilingual ASR systems for global or low-resource languages.
Exposure to enterprise-grade speech platforms and scalable ML infrastructure.
Education Qualifications
B.Tech / B.E in Computer Science, Electronics, Electrical, or related fields (Required).
M.Tech / M.E in AI, ML, or Speech Processing (Preferred).
Ph.D. in ASR, Audio ML, or Deep Learning (Advantage).
Role Details
Location: Hybrid / On-site (Based on Project Needs)
Level: Core Platform Engine
Division: Voice AI Platform
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
