I had run across the online demo of the AT&T Natural Voices TTS engine quite awhile ago, but seeing a reference to it in an article today on Slashdot reminded me of how cool it is. You enter text in a text box and choose a voice from a list of thirteen. The list includes five US English, two Latin American Spanish, two French, two German, and two UK English voices, with a near even balance between male and female voices.
After you submit the request, a wav file is generated immediately on the AT&T server and downloaded by your web browser. The quality is really quite good.
While you can, of course, enter English text and have it synthesized by a Spanish, French, or German voice, the text won’t be translated. That would require a little hacking to pipeline the output of Altavista Babel Fish, Google Translate, SYSTRAN, etc. into this form. However, if you were ever wondering how someone from a German, Spanish, or French speaking country might mispronounce a particular English word, here’s your chance to find out.
FYI, the free VoiceXML hosting service at TellMe uses Natural Voices. BeVocal offers a couple options on their hosting service – male and female US English voices, a Spanish female voice from Nuance Vocalizer, and a German female voice from ScanSoft RealSpeak. I think that the US English voices, Jennifer and Mark, are RealSpeak voices.
It turns out that ScanSoft also has a demo page for RealSpeak. ScanSoft/SpeechWorks acquired RealSpeak from Lernout & Hauspie a few years ago after L&H self-destructed. You first pick from 19 different languages, and then from three different sampling frequencies. Assuming you are going to be using the TTS on PSTN phone calls, pick 8 kHz. On the next page, you select a voice (if more than one are offered for the language you chose) and then enter up to 100 characters to be synthesized. Unfortunately, when I clicked the Next button, I got an error message from my web browser that “The connection was refused when attempting to contact demo.lhsl.com”. Perhaps this is a temporary problem, or maybe they no longer support the demo running from that old L&H domain, but they haven’t updated the product pages on the ScanSoft website, yet?
ScanSoft also has a demo page for their premier TTS engine, Speechify. You get to pick from eleven different languages and voices. The text input is limited to 255 characters.
Nuance offers a demo of Vocalizer, but you have to fill out a form. Fortunately, you get immediate access to the demo after submitting the form. The Vocalizer demo includes a mix of eight language and voice combinations. The text you enter is limited to 200 characters.
As an aside, voices for TTS engines are traditionally given a first name. The name is typical of the language, e.g., Tomoko for a Japanese voice, Maarten for a Dutch voice, and Javier for a Spanish voice. The Vocalizer voices have a first name and last name, e.g., the male Australian voice is Josh Donnelly and the female Latin American Spanish voice is Catalina Romero.
If you want to try out more online demos from lots of smaller companies and from research groups and open source projects, check out this page at the University of Texas. It’s a pretty comprehensive list. There is also a good links page at the Oregon School of Science & Engineering website that provides links to a lot of TTS research projects.
Probably the best known open source TTS engine is Festival, from the University of Edinburgh. You can try an online demo of Festival at a site hosted by Carnegie-Mellon University. There are a combination of twelve voices and languages. The biggest differentiator is the male Scottish voice. Unfortunately, the quality of the Festival synthesized voices is a big step down from what ScanSoft, Nuance, and AT&T have to offer. If you used a Macintosh in the mid-80’s, you will have flashbacks to the talking moose. I would say that the output from the Festival engine is on par with the second or third tier TTS players, like Microsoft.