Company Description

Eros Innovation is a global technology company operating at the intersection of Artificial Intelligence, media, and next-generation digital platforms. We focus on building advanced Generative AI solutions, multimodal systems, and scalable AI infrastructure that power real-world enterprise applications.

At the core of our ecosystem is Eros Gen AI our proprietary Generative AI platform designed to deliver cutting-edge capabilities across large language models (LLMs), vision-language systems, speech AI, and retrieval-augmented intelligence. Eros Gen AI drives both research innovation and production-grade deployments, enabling intelligent automation and AI-driven transformation at scale.

If youre passionate about building impactful AI systems and working on frontier technologies, Eros Innovation is where innovation meets execution.

Role Description

We are seeking a highly skilled Generative Speech Research Engineer with 3+ years of experience to design, train, and deploy advanced Text-to-Speech (TTS) and generative speech models at scale.

This role focuses on developing state-of-the-art voice AI systems, including controllable speech synthesis, expressive voice modeling, and foundation model training for speech and audio generation.

The ideal candidate should have strong expertise in deep learning, generative architectures, distributed training, and production-grade deployment of AI models.

Key Responsibilities

Design and train large-scale TTS and generative speech models
Develop transformer, diffusion, and autoregressive architectures
Implement distributed training across multi-GPU and multi-node clusters
Build scalable training pipelines using PyTorch and CUDA
Optimize model performance, scalability, and efficiency
Design novel architectures for speech generation
Develop semantic-to-acoustic mapping pipelines
Implement token-based and representation learning approaches
Integrate LLM-based speech representation systems
Implement supervised fine-tuning (SFT)
Apply reinforcement learning-based alignment methods (PPO/DPO)
Build voice cloning and expressive speech adaptation pipelines
Design custom loss functions and optimization strategies
Evaluate models using objective and perceptual metrics (WER, MOS, etc.)
Optimize inference latency and generation quality
Deploy models as scalable production APIs
Collaborate with cross-functional teams to translate research into production

Qualification

3+ years of experience in Deep Learning / Generative AI
Strong proficiency in Python
Hands-on expertise in PyTorch
Experience training TTS, speech synthesis, or generative audio models
Strong understanding of Transformers, diffusion, and generative architectures
Experience with distributed training frameworks (DDP, FSDP, DeepSpeed)
Solid understanding of speech representation learning
Experience working with GPU clusters and accelerators
Strong system design and research implementation skills

Generative Speech Research Engineer (Voice AI)

Eros Innovation

Job Description

Services you might be interested in

We Search & Apply Jobs for You!