InCommon is hiring on behalf of a fast-growing SF/Bangalore based Voice AI Startup.

About the Role

This is not a typical data role. You wont be building dashboards. You wont be maintaining pipelines no one touches.

You will take messy, noisy, real-world data and turn it into something models can learn from. Think of this as running a gold mine - you take dust and convert it to gold.

We work on speech, language, and real-time systems across 50+ languages.

The difference between a good model and a great one is almost always data quality + data systems. Thats where you come in.

What Youll Work On

Data Pipelines (Real-time + Batch)

Build high-throughput pipelines for audio, text, and multimodal data
Streaming + offline processing at scale

Data Quality & Curation

Cleaning, filtering, deduplication, normalization (numbers, emails, code-mix, etc.)
Designing heuristics + ML-based data filtering systems

Multilingual Data Systems

Handling 50+ languages, accents, and code-mixed inputs
Language-aware normalization and segmentation

Training Data Engine

Build pipelines that continuously generate better training data from production
Active learning loops, data selection, sampling strategies

Evaluation & Benchmarking Pipelines

Create scalable eval datasets across languages and domains
Automate quality tracking for ASR, TTS, and conversational systems

Data Infra for Research

Work closely with research team to unblock experiments fast
Build systems that reduce iteration time from weeks hours

What This Role Is NOT

Not a dashboard/reporting role
Not a move data from A to B role
Not a maintenance-heavy legacy pipeline role

What Were Looking For

Strong fundamentals in data structures, systems, and pipelines
Experience with large-scale data processing (audio/text preferred)
Comfortable with messy, unstructured, real-world data
Strong coding skills (Python required; systems experience is a plus)
Understanding of ML/data pipelines (training, eval, data curation)

Bonus (Not Mandatory)

Experience with speech/audio data (ASR/TTS)
Familiarity with multilingual datasets
Experience with streaming systems (Kafka, etc.)
Exposure to data-centric AI / data quality frameworks

Data Scientist

InCommon

Let experts apply while you prepare for interviews

Job Description

Services you might be interested in

We Search & Apply Jobs for You!