IBM Opens Speech Code

By | September 14, 2004

When I first heard the announcement (full story from NY Times requires registration, excerpt from C|Net doesn’t), I was hoping that IBM was open sourcing their ASR and TTS engines. But, it turned out to be two other parts of their voice portfolio.

IBM is donating source code for their Reusable Dialog Components to the Apache Software Foundation. The RDC were developed as chunks of static VoiceXML code that perform common dialog functions, such as collecting address information or dates. At the spring SpeechTek conference, someone from IBM told me they were porting the RDCs to JSPs that generate VoiceXML. If nothing else, I hope the RDCs will provide good code samples to further popularize the development of VoiceXML applications.

The Call Flow Builder

IBM is also donating some or all of their Voice Toolkit to the Eclipse organization. The Voice Toolkit was reimplemented as plug-ins for Eclipse about two years ago. It’s a pretty nice application, although the last time I checked out the preview version on the IBM alphaWorks site, it had a lot of complicated dependencies. Also, it was supported only on Windows. The official 5.1 release is now available. Unfortunately, it still runs only on Windows.

The Voice Toolkit Call Flow Builder is a fairly simple GUI for creating the basic dialog of a call flow as a directed graph (i. e., boxes and arrows). Once you get the call flow mostly scoped out, you can generate markup from your diagrams. The native XML dialect it generates can be automatically translated into VoiceXML. I don’t know if this feature is in 5.1, but I think they were also planning to support generation of JSPs, HTML, or whatever other markup language you wanted to generate. All you need to provide is the appropriate XSLT script to do the transformation.

The grammar development tools in the Toolkit Preview were nothing to get excited about, but I’ve yet to see good grammar development tools from anyone, and yes, I’ve looked at a lot of tools. The pronunication builder was pretty nice, though.

Once you generate the markup, you move into a more traditional programming environment where you edit markup and code. The RDCs mentioned above can be helpful to fill out the rest of your app, though I expect they will also make them available from the GUI.

The worst thing I could say about the Voice Toolkit was that when I tried it about six months ago, the documentation was pretty bad. There were huge chunks of missing information and way too many typos.

The NY Times ends with a comical quote from a director of marketing at Microsoft who claims that IBM is following Microsoft. Hmm, I didn’t know that Microsoft had open sourced their speech development tools under an OSI-compliant license. I think that’s news to everyone, including the development team at Microsoft.

Microsoft is clearly the follower in speech platforms and applications development. They’re still pretty far behind, even though they are making good progress. They shouldn’t be ashamed to be a follower in this space. They picked a very good time for entry. It’s just hard to take them seriously when their representatives make laughable claims.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.