Exploring Voice AI in 2025: Super Dial's Approach & Challenges

As a voice AI engineer at Super Dial, I'll dive into the challenges of building reliable voice agents and share insights on how we approach conversational design, tooling, and infrastructure to deliver value in the last mile.

1. Nick is an engineer at Super Dial, a company that specializes in phone calls for Mid to large-sized healthcare administration businesses.
2. Super Dial's platform allows customers to build their own script and conversation design, ask questions over the phone, and receive structured results.
3. The platform uses voice bots for initial calls and falls back on human operators when necessary, ensuring reliable call completion and answers for customers.
4. Super Dial aims to learn from each call, updating office hours and improving phone tree traversal for future calls.
5. Voice AI in 2025 is expected to feature new smart, fast, affordable, and complex conversational use case-supporting devices.
6. Challenges remain in turning chat agents into voice agents and dealing with audio hallucinations, pronunciation, and spelling issues.
7. The explosion of voice AI infrastructure and tooling has led to the question: what's worth owning? Speech-to-speech or voice too models are a significant concern.
8. Super Dial believes that these models aren't yet reliable for production use, as they can generate incorrect or nonsensical responses.
9. Shifting from prescriptive to descriptive development is essential when designing voice UI.
10. Conversation designers should consider whether to be open-ended or constrain user choices in questions, adapting to the caller's response instead of preventing mistakes.
11. Hiring a conversation designer and doing table reads can help engineers identify gaps and awkwardness in dialogues.
12. Super Dial uses Pipat for voice AI orchestration, an open-source framework that's easy to extend and hack upon, allowing self-hosting, deployment, and scaling.
13. Owning your own OpenAPI endpoint can lead to a better interface with new voice AI tools, as it enables routing to different models based on latency or other factors.
14. Using an open-source tool like Tensor Zero provides stretched and typed LLM endpoints for experimentation in production environments.
15. For text-to-speech systems, ensuring the correct pronunciation of names and handling long strings of characters is essential.
16. Upgrade paths and fallbacks are crucial when working with third-party services that experience occasional downtime.
17. End-to-end testing voice AI applications can be challenging, as there isn't yet a standard method or tool for this purpose.
18. Riding the wave of new developments in voice AI is important, and using new models quickly and safely will help engineers stay ahead in the field.

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!