Login Sign Up

Data Scientist

InCommon

2 - 5 years

Bengaluru

Posted: 28/04/2026

Getting a referral is 5x more effective than applying directly

Job Description

InCommon is hiring on behalf of a fast-growing SF/Bangalore based Voice AI Startup.


About the Role

This is not a typical data role. You wont be building dashboards. You wont be maintaining pipelines no one touches.

You will take messy, noisy, real-world data and turn it into something models can learn from. Think of this as running a gold mine - you take dust and convert it to gold.

We work on speech, language, and real-time systems across 50+ languages.

The difference between a good model and a great one is almost always data quality + data systems. Thats where you come in.


What Youll Work On

Data Pipelines (Real-time + Batch)

  • Build high-throughput pipelines for audio, text, and multimodal data
  • Streaming + offline processing at scale

Data Quality & Curation

  • Cleaning, filtering, deduplication, normalization (numbers, emails, code-mix, etc.)
  • Designing heuristics + ML-based data filtering systems

Multilingual Data Systems

  • Handling 50+ languages, accents, and code-mixed inputs
  • Language-aware normalization and segmentation

Training Data Engine

  • Build pipelines that continuously generate better training data from production
  • Active learning loops, data selection, sampling strategies

Evaluation & Benchmarking Pipelines

  • Create scalable eval datasets across languages and domains
  • Automate quality tracking for ASR, TTS, and conversational systems

Data Infra for Research

  • Work closely with research team to unblock experiments fast
  • Build systems that reduce iteration time from weeks hours


What This Role Is NOT

  • Not a dashboard/reporting role
  • Not a move data from A to B role
  • Not a maintenance-heavy legacy pipeline role


What Were Looking For

  • Strong fundamentals in data structures, systems, and pipelines
  • Experience with large-scale data processing (audio/text preferred)
  • Comfortable with messy, unstructured, real-world data
  • Strong coding skills (Python required; systems experience is a plus)
  • Understanding of ML/data pipelines (training, eval, data curation)


Bonus (Not Mandatory)

  • Experience with speech/audio data (ASR/TTS)
  • Familiarity with multilingual datasets
  • Experience with streaming systems (Kafka, etc.)
  • Exposure to data-centric AI / data quality frameworks

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.