
September 2023 Releases
Building an AI voice agent for a contact center requires more than just selecting a great-sounding voice. Your text-to-speech (TTS) provider plays a crucial role in ensuring natural conversations, low latency, and high reliability. But choosing the right text-to-speech provider for AI voice agents isn’t just about voice quality—it’s about selecting the right tool for the specific job, whether it’s an IVR replacement, outbound sales call, support call, or collections call.
In this blog post, we’ll explore:
A contact center AI voice agent operates in a real-time conversational loop:
This loop must be as fast as possible to maintain a smooth, real-time conversation. While TTS providers alone typically operate around 300ms, factoring in speech-to-text (STT), LLM processing, and telephony, the total response time is usually in the range of 800–1200ms. The TTS provider plays a critical role in delivering a clear, natural voice as quickly as possible, making latency and quality key factors in choosing the right provider.
These following are the factors Regal considered when deciding which TTS providers to incorporate into our AI Agent Platform. Note: we’re not considering price here because all of the TTS providers are competitive on price within reason, and the price will continue to come down. The assumption here is that we’re applying AI Voice Agents to conversations that are currently being done by humans, and therefore picking the best TTS provider is better than the cheapest.
Voice quality determines how natural and engaging your AI voice agent sounds. A well-tuned voice can improve customer experience, reduce frustration, and create a more human-like interaction. Some TTS providers offer backchanneling support, allowing the AI to interject with sounds like "uh-huh," "got it," or "I see" to create a more fluid, natural conversation.
Not all TTS providers optimize for real-time contact center conversations—some focus more on voices for creative applications like audiobooks, video voiceovers, and dubbing. For contact centers, the most important factors are:
✅ Natural intonation – Does it mimic real human speech?
✅ Emotion & expressiveness – Can it shift tone based on context?
✅ Backchanneling support – Does it enhance conversational flow?
We’ve found that ElevenLabs, Play.ht, and Rime all offer a subset of voices optimized for contact centers, with ElevenLabs having the largest library of realistic voices. OpenAI serves as a great backup provider, while PolyAI and Google voices sound more voice-assistant-like and Deepgram’s voices are far too robotic for this use case (but cheap).
Voice consistency ensures that the AI delivers responses predictably. Inconsistent output can make interactions feel unnatural or erratic. Interestingly, when prospects demo Regal AI Voice Agents, they tend to prefer expressive voices, but when deploying in real contact center environments, they prioritize consistency over expressiveness.
Customization allows businesses to refine the voice agent’s tone, style, and pronunciation. Some providers offer voice cloning, SSML support, and phonetic control to enhance flexibility.
Latency is the delay between input and response. Lower latency improves real-time interactions, while higher latency makes the AI seem slow.
Most TTS providers now offer similar average latency, but the consistency of latency varies:
Rime: provides an on-prem option, however, the company itself is not yet as well funded or established as the other providers.
If your contact center serves global audiences, language and accent support is crucial. ElevenLabs offers the most natural multilingual voices (and they have a community library with voices from many accents), while Play.ht supports the widest range of languages.
So with all of these considerations, which is the right text-to-speech provider for AI voice agents at your company? That really depends on your use cases. Contact centers can use multiple TTS providers, assigning different ones to different types of calls for better engagement and cost optimization.
At Regal, we understand that no single TTS provider is perfect for every use case or stable enough to rely on just one. That’s why we offer multiple providers that excel at different things – ElevenLabs, PlayHT, OpenAI – giving contact centers the flexibility to:
✅ A/B Test Multiple Providers – Easily compare voice quality, latency, and performance.
✅ Use Different Voices for Different Use Cases – Choose the best provider for IVR, sales, support, or collections.
✅ Support Backup TTS Providers – Prevent outages by automatically switching to a secondary or tertiary provider when needed.
Ready to build a high-performance AI voice agent for your sales and support interactions? Click here to learn more about our AI Agents or click here to reach out to Regal.ai today to see how we can power your contact center with the best AI voices in the industry!
Ready to see Regal in action?
Book a personalized demo.