Login Sign Up
🔔 FCM Loaded

Generative Speech Research Engineer (Voice AI)

Eros Innovation

2 - 5 years

Chennai

Posted: 06/03/2026

Getting a referral is 5x more effective than applying directly

Job Description

Company Description

Eros Innovation is a global technology company operating at the intersection of Artificial Intelligence, media, and next-generation digital platforms. We focus on building advanced Generative AI solutions, multimodal systems, and scalable AI infrastructure that power real-world enterprise applications.

At the core of our ecosystem is Eros Gen AI our proprietary Generative AI platform designed to deliver cutting-edge capabilities across large language models (LLMs), vision-language systems, speech AI, and retrieval-augmented intelligence. Eros Gen AI drives both research innovation and production-grade deployments, enabling intelligent automation and AI-driven transformation at scale.

If youre passionate about building impactful AI systems and working on frontier technologies, Eros Innovation is where innovation meets execution.


Role Description

We are seeking a highly skilled Generative Speech Research Engineer with 3+ years of experience to design, train, and deploy advanced Text-to-Speech (TTS) and generative speech models at scale.

This role focuses on developing state-of-the-art voice AI systems, including controllable speech synthesis, expressive voice modeling, and foundation model training for speech and audio generation.

The ideal candidate should have strong expertise in deep learning, generative architectures, distributed training, and production-grade deployment of AI models.

Key Responsibilities

  • Design and train large-scale TTS and generative speech models
  • Develop transformer, diffusion, and autoregressive architectures
  • Implement distributed training across multi-GPU and multi-node clusters
  • Build scalable training pipelines using PyTorch and CUDA
  • Optimize model performance, scalability, and efficiency
  • Design novel architectures for speech generation
  • Develop semantic-to-acoustic mapping pipelines
  • Implement token-based and representation learning approaches
  • Integrate LLM-based speech representation systems
  • Implement supervised fine-tuning (SFT)
  • Apply reinforcement learning-based alignment methods (PPO/DPO)
  • Build voice cloning and expressive speech adaptation pipelines
  • Design custom loss functions and optimization strategies
  • Evaluate models using objective and perceptual metrics (WER, MOS, etc.)
  • Optimize inference latency and generation quality
  • Deploy models as scalable production APIs
  • Collaborate with cross-functional teams to translate research into production


Qualification

  • 3+ years of experience in Deep Learning / Generative AI
  • Strong proficiency in Python
  • Hands-on expertise in PyTorch
  • Experience training TTS, speech synthesis, or generative audio models
  • Strong understanding of Transformers, diffusion, and generative architectures
  • Experience with distributed training frameworks (DDP, FSDP, DeepSpeed)
  • Solid understanding of speech representation learning
  • Experience working with GPU clusters and accelerators
  • Strong system design and research implementation skills



Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.