Porting Code from TellMe to BeVocal
The porting process went pretty quickly. Fortunately, the Python CGI scripts didn’t have to change. Three cheers for standards and for application communication via XML over HTTP.
- Add DTD DOCTYPE to all vxml files so that VoiceXML syntax checker can check for well-formedness
- Must use the BeVocal DTD if using any BeVocal extensions, like the data tag or bevocal:foreach tag
- audio tag must have an attribute like src.
- break tags must be inside prompt tags
Although TellMe also offers a VXML syntax checker, it assumes you are using their DTD. I haven’t tried it with a different DTD, yet, to see if it would actually use it. I like the fact that BeVocal requires it, since it forced me to identify which parts of my code were non-standard.
TellMe also supports the data tag and the foreach tag. The data tag is really cool, as it allows you to return an XML document from a CGI script or a servlet (anything on the other side of an HTTP GET). I hope it makes it into the VoiceXML 2.1 spec.
Tellme allows you to treat an audio tag like a prompt tag and does not require that break tags be inside a prompt. I think BeVocal’s stricter interpretations of the spec are correct.
Grammar File Changes
Both TellMe and BeVocal support Nuance grammar files. I had already decided that I would switch over to the standard SRGS XML format as part of the move to BeVocal. My grammar file wasn’t that complicated, but the lack of good examples for a simple SRGS XML grammar made it far too arduous. I have posted on the SoccerPhone SourceForge project site the source code for both the TellMe code (GSL grammar) and the BeVocal code (SRGS XML grammar) as part of release 0.2
BeVocal requires an xml:lang attribute for a grammar, even if it is a dtmf grammar for which that tag is ignored. I haven’t read the SRGS spec closely enough to know if this is an error in their implementation or an oddity of the spec. Also, if I had stuck with Nuance grammar files, I would have needed to specify the grammar type as type=”application/x-nuance-gsl” instead of “application/x-gsl”.
Both sites have really nice on-line development and debugging tools. Right now, I can’t say that I have a clear favorite. The TellMe seems a little more cohesive, but the BeVocal site seems more up to date. The TellMe development tools (at least the free, online ones) have improved in only a few, minor ways in over a year.
The BeVocal Vocal Debugger looked pretty cool, but I didn’t spend much time with it, as the Trace Tool was sufficient for me to find all the problems.
Text To Speech
TellMe is the big winner here for using AT&T Natural Voices. It is far superior to whatever BeVocal is using. In addition to the superior sound quality and accuracy, the TTS engine on Tellme is better at guessing context. The best example is “minute”. Let’s say a game is in progress in the 47th minute. After reading the score, I have SoccerPhone say “minute 47”. BeVocal’s TTS engine pronounces it as “my-nyewt”, as if it were something small. TellMe’s TTS engine pronounces it correctly.