Role Summary:

We are hiring a highly skilledAudio ML Engineerto build and optimize our end-to-endAutomatic Speech Recognition (ASR)pipeline. The role includes development ofstreaming ASR,multilingual transcription,Language Identification (LID), andhallucination correctionsystems. You will work with state-of-the-art architectures, delivering scalable, production-grade ASR solutions powering our Voice AI platform.

Key Responsibilities

Develop, train, and optimize ASR architectures (Whisper, Conformer, RNN-T, CTC-based models).

Build streaming ASR pipelines with chunk-based inference and strict low-latency constraints.

Implement language identification (LID), punctuation restoration, and hallucination suppression mechanisms. Integrate ASR with diarization, DSP pipelines, and TTS for synchronized multi-speaker systems.

Optimize inference on GPUs using TensorRT, mixed precision, quantization, and dynamic batching.

Design and deploy scalable ASR APIs using FastAPI, Docker, and Kubernetes.

Develop evaluation frameworks for WER, CER, latency, memory usage, and throughput benchmarking.

Curate multilingual datasets and fine-tune models for noisy and low-resource language environments.

Required Expertise

Strong proficiency inPyTorchwith hands-on ASR model training and deployment experience.

Deep understanding of CTC, RNN-T, attention-based ASR architectures, and speech encoders.

Experience with streaming ASR, VAD gating, chunk alignment, and timestamp stability.

Knowledge of LID systems, punctuation restoration, and post-processing pipelines.

Practical experience in GPU optimization and high-throughput inference systems.

Familiarity with FastAPI, Docker, CI/CD workflows, and Kubernetes orchestration.

Preferred Qualifications

Experience fine-tuning Whisper or building RNN-T based production ASR systems.

Experience building multilingual ASR systems for global or low-resource languages.

Exposure to enterprise-grade speech platforms and scalable ML infrastructure.

Education Qualifications

B.Tech / B.E in Computer Science, Electronics, Electrical, or related fields (Required).

M.Tech / M.E in AI, ML, or Speech Processing (Preferred).

Ph.D. in ASR, Audio ML, or Deep Learning (Advantage).

Role Details

Location: Hybrid / On-site (Based on Project Needs)

Level: Core Platform Engine

Division: Voice AI Platform

Audio ML Engineer - ASR

The Future of Voice

Job Description

Services you might be interested in

Improve Your Resume Today