March 31, 2004
Kai-Fu Lee Keynote at SpeechTEK

Kai-Fu Lee is the VP for Speech Solutions at Microsoft. He spoke at SpeechTEK after Bill Gates last week, going into much more detail on Microsoft Speech Server. Microsoft is targeting medium (25-250 agent equivalents), and large (250+ agent equivalents) enterprises. This was a bit surprising to me, as Speech Server appears to be a typical Microsoft 1.0 release, lacking in features, average performance, and somewhat less than stable (based on the demos, anyway). I expect they will actually have far more success on the low end, but I understand the need to put on a good show about the product being enterprise ready.

There isn't a lot that is innovative in their solution. It's good and it's cheap, and there's a lot to be said for that, but mostly it's a clone of what many other companies have been doing with VoiceXML for quite a few years. As Kai-Fu said, Microsoft is good at volume sales. I think they should be proud of what they have created, but they're still a few years behind most of their competitors. The race is on.

Kai-Fu said that customer's have told them that speech systems are too expensive, too complex, too inflexible with respect to scaling and deployment, and not well integrated. Microsoft appears to have taken a good shot at the first. We'll have to wait to see how they do against the other objectives.

A product manager then gave a really basic demo of changing a hotel reservation. The first call failed to connect, but Speech Server managed to respond to the second call. This was followed by a demo of a simple multimodal app using Pocket IE Explorer and speech recognition. The Pocket PC UI giving feedback on microphone signal strength was cool.

The biggest news by far was their pricing. They do pricing per simultaneous speech channel and per processor. They also provide a low end Standard Edition and a high end Enterprise Edition.

  • Standard Edition - 4-24 channels - $8,000 per processor
  • Enterprise Edition - 24-96 channels per node - $18,000 per processor

The packages include the development tools, a SALT browser, ASR, and TTS. Both editionsinclude the ScanSoft Speechify TTS engine. That's good, because my experience with their TTS software had been pretty iffy. Their ASR software was mediocre, too, but I have heard from several sources that it is significantly improved. You can use the Enterprise edition with ScanSoft's OSR ASR engine, but you can use only Microsoft ASR with the Standard edition. Nuance fans need not apply for either version. Perhaps Nuance would not bend to the OEM pricing levels that Wal-Mart, I mean Microsoft, demanded.

Kai-Fu glossed over the fact that you still have to buy all the telephony hardware and software from Intel/Dialogic and Intervoice if you actually want to use Speech Server with live telephone calls. Also, VoIP is not supported, just plain old PSTN style calls. The Microsoft website links you to some partner sites where you can request a price quote for a starter system. Microsoft offers a full-featured 180-day trial version of Speech Server, but you still have to buy all the telephony equipment. Even the most basic set-up will cost you about $1000 at a deep discount from their partners trying to grab marketshare.

Standard edition is an all-in-1 box. Everything has to run on the same server, so you might see some performance problems if your application are complex and have elaborate recognition requirements. Also, you get no failover capabilities.

Kai-Fu said that speech application development costs are way too high. They hope to unleash a significant portion of the alleged 7 million developers using Visual Studio onto building speech apps. I worry that this will be like early version of VB and Front Page all over again, with a sea of really bad speech apps to replace the bad desktop apps and bad websites. What makes it even worse, is that voice user interfaces are even harder to design than graphical user interfaces. The Microsoft speech tools are not bad, but they have a very long way to go before your average developer is going to be able to write a speech app that you can tolerate using more than once.

The presentation was followed by a couple customer demos, none of which went smoothly. First up was the NYC Department of Education. They have a web portal that parents can use to get info (absences, grades, food menus, etc.) about their kids and their school. They wanted to speech enable it so as to offer access to those familes without computers. However, my understanding from Kai-Fu's speech was that Speech Server supports English only. I suspect that the parents in many of the families without computers speak little to no English. The presenter called the number three times before he finally got ringback, but Speech Server never answered the call. After a couple minutes, an assistant finally got it to work. They did a pretty basic speech enabling of the web portal. Nothing exciting, but it did show that Speech Server actually worked. It wasn't clear whether the problems were operator error or Speech Server failing to answer calls.

The next demo was a semi-disaster. The executive director for some part of the State of Alabama Corrections had trouble seeing the keyboard and the phone. Like the previous presenter, he brought up their web portal. He made a big deal about claiming that he would use his own personal information, so as to not release any private information for a citizen of Alabama. He then proceeded to type in his OWN SOCIAL SECURITY NUMBER, in plain view of a couple hundred people whom he did not know. This brought up a web page with his birth date, height, weight, driver's license number, license tag on his car, description of his car, etc. The NYC guy at least had a fake family in his system to use for demos. I couldn't believe this guy hadn't done the same.

Then the fun began. To demonstrate how a police officer would use the speech app, he called into the system and read in a license tag. Although his voice sounded pretty clear to me, the ASR engine (they didn't say if it was MS or ScanSoft) misrecognized several characters. It then read back some private information about a vehicle owned by a tow truck company in Clayton, Alabama. So much for the protection of the private info of Alabama citizens. After another try using the NATO phonetic alphabet (Alpha, Bravo, Charlie, etc.) for the letters, he got it to work.

This was followed by a demo from two people from Grange Insurance in Seattle. Their demo actually worked on the first try.

Finally an ISV, Solar Software, and an SI, Accenture, gave demos. Their demos went very well. Solar Software speech enabled Microsoft CRM. Accenture showed a multimodal app of questionable value, but at least it worked. Their argument for going with Speech Server was that it was inexpensive (Kai-Fu Lee prefers the term "better economics") and they could use the Visual Studio environment that they were already familiar with. Given the short timeframe to give these demos, it's a little tough to do something really fancy. So, I probably shouldn't be so hard on these guys.

Posted by Robert at 10:45 PM | link | comments (0) | trackback (0)
March 28, 2004
SoccerPhone for 2004

No, I'm not asking you to vote for SoccerPhone for President in 2004. I'm just letting anyone who cares know that SoccerPhone is working this year without me having to make any changes to the code. Fortunately, the people running the MLS website didn't make any significant changes to the HTML code on the live scores page. In case you are wondering what any of this means:

SoccerPhone is a free, automated service that provides live Major League Soccer scores by phone.

I wrote this application because I wanted to have remote access to updated MLS scores, I wanted to learn how to create VoiceXML applications, and I wanted to learn how to code in Python.

Posted by Robert at 11:56 PM | link | comments (0) | trackback (0)
March 25, 2004
Bill Gates Keynote at SpeechTEK

Bill Gates was the main keynote speaker at SpeechTEK/VisualStudio Live/MS Mobile Devcon on Wednesday. This was the first time I've ever been in the same room with the richest man in the world. Just me, the rich guy, and a few thousand of my very best friends.

The first ten minutes of his speech were fairly content free. Quick summary: "Hardware sure is getting faster, year after year." Things livened up when he switched to a video of a parody commercial. This is a Microsoft tradeshow tradition, and is definitely something I admire Gates for doing. The parodies are usually very funny, and often self-deprecating. This time it was a parody of a series of Microsoft Office commercials that celebrate the accomplishments of the IT worker in a style that reminds me of old NFL highlights videos. He aparently used the same video at the International Consumer Electronics Show in Las Vegas in January.

The clip featured Bill and co-workers (no Ballmer, but for all I know, everyone else was an actor) sitting at a conference room table with an array of PCs, cellphones, Pocket PCs, routers, etc. all laid out and hooked up in a big jumble of wires. As the camera panned across the table and the deep-voiced narrator talked about the hard working IT staff (I'm not doing the video justice here, it was actually quite funny) the wires ended up hooking into a toaster. Bill, at the other end of the table, pressed a couple keys on the keyboard and two pieces of toast shot out from the toaster. The camera then cut to Bill jumping up and down in slow motion with toast in hand and celebrating with his co-workers. Lots of poorly executed high-fives, in standard mocking geek style. If you've seen the original commercials, you can probably imagine this better than I am describing it.

Then it cut to Bill, with toast in hand, and his co-workers running down a corridor in the office, gleefully leaping into the air and shouting with huge, stupid grins on their faces. Finally, they all dance around Bill and goofily celebrate as he spins a piece of toast on the ground like a football player celebrating in the endzone after scoring a touchdown. The final text and narration glorifies their proud accomplishment of having used Microsoft Office to program a toaster. You really had to be there.

Gates then talked about four key areas of focus for Microsoft; at least the areas they wanted to push to this audience.

  • Mobility
  • Speech
  • Web services
  • Location based services

He brought on a staff member to demonstrate new features in Visual Studio, a.k.a, Whidbey. The demonstrator showed off some Visual Basic coding. Overall, Visual Studio 2005 seemed pretty slow, but the compiler was unbelievably slow. The presenter looked like he was just about to give up on it before it finally finished. One new feature they are pushing hard is code snippets. Other IDEs have had this for many years, but it's an innovation for Microsoft. Code snippets could be a good thing, or a very bad thing. The code snippets feature allows you to bring up a context menu and select from a list of a few hundred code snippets Microsoft will provide, plus any code snippets you decide to add. Think of a code snippet as boiler plate code, or a template. This could definitely save you a lot of time. But, it can also create a copy and paste disaster. Rather than using common subroutines, you could end up (especially on a multi-person development team) with many slightly different versions of essentially the same code.

Although he made a disclaimer that, in Julia Childs style, he was working with a previously prepared UI for his sample app, the presenter claimed that in just three lines of code (he used a code snippet to paste in a bunch more code) he finished up a web-based app for working with auto insurance claims.

Another guy came out to talk about Visual Studio for devices. His demo consisted of creating a photo blogging tool on the fly. Admittedly, he did have only ten minutes or so, but the app he created really didn't do that much and most of the code that did the real work was already prepared in advance. He then published the app to a Microsoft mobile dev portal and then downloaded it to his camera phone. He then wanted to use a Pocket PC to show that the photo had appeared on the blog.

The camera switched over to show his Pocket PC, which was displaying a note reminding him about his presentation. In probably the biggest demo disaster of the day, he couldn't dismiss the reminder. His Pocket PC had just locked up. After a short bit of desperate mashing of the buttons and poking the screen with a stylus, he bailed out and switched to a regular PC. While the earlier presenter was able to gloss over the slowness of Visual Studio 2005 by saying "if it was ready, we would have already released it," I'm assuming this guy was using released software.

The only interesting part of his demo was the location services. The blogging app was able to get his location (presumably to the accuracy that cellphone towers will allow) and automatically supply it.

Another Microsoft product manager type then gave a demo of their speech development tools. To no surprise, they are nicely integrated into Visual Studio. He showed off a data table navigator that automatically creates a grammar based on bound data. The grammar editor was very nice, but the prompt editor was quite weak. The tool also provides a built-in simulation environment so you can do basic functional testing of your app.

The surprising aspect of the speech demos was that all they showed was that Microsoft can now do what lots of other companies have been doing for five to ten years.

Although they tried to pass off the psuedo-standard SALT specification as superior to VoiceXML because SALT has multimodal capabilities designed in, they did not demo any multimodal capabilities. In the video they showed to demonstrate how a hotel might some day use Speech Server, the multimodal examples were pretty gratuitous.

Gates finished up by saying their goal was to provide seamless speech UI across devices. Their vision includes support for pervasive multimodal interaction and speech dictation.

Posted by Robert at 10:27 PM | link | comments (0) | trackback (0)
March 23, 2004
VoiceXML 2.1 Draft

Archive of W3C News in 2004 - Working Draft: VoiceXML 2.1

As announced on the W3C website, the voice browser working group email discussion list, and at the SpeechTEK conference I'm attending, the working draft for VoiceXML 2.1 was released today.

The best news for me is that the <data> tag is part of the draft. The data tag lets you retrieve an XML document via an HTTP request and continue on in the same VXML document. Both Tellme (Hey, what's up with the dumbed down, Flash-crazed, nearly content-free new Tellme website? Please bring back the old site, which actually contained useful info.) and BeVocal already implement the data tag in their VXML browsers. I used the data tag in SoccerPhone and PhoneBlogger to retrieve an XML document containing configuration data. I then parse the XML document with ECMA/JavaScript to extract the config data.

Another advantage of the data tag is that it makes it easier to develop simple XML over HTTP web services that you can easily reuse with non-VXML applications. I'm talking about simpler than SOAP and XML-RPC web services. Just good old RESTful style web services. Without the data tag, the only standard way to get data back to a VXML app was to have the HTTP request return a VXML document to transition to. That makes it hard to reuse your data integration service. You typically end up having to wrap the data integration service with a simple VXML document just to keep the dialog going.

The <foreach> tag is also pretty handy. I used it for looping through JavaScript arrays in SoccerPhone. Since it is not yet an official part of the spec, I ended up having to implement the SoccerPhone VXML code slightly differently between Tellme and BeVocal.

Finally, it's really great to see consultation transfer get added. Many call center applications are difficult or impossible to implement without support for consultative transfers. Lots of VXML broswer vendors added support anyway, just in a proprietary way.

Posted by Robert at 11:33 PM | link | comments (0) | trackback (0)
March 21, 2004
SpeechTEK

I'll be at SpeechTEK Spring across the bay in San Francisco Tuesday through Friday. Leave a comment or email me at robert AT wombatnation DOT com if you'll be there, too, and would like to meet up. I won't be manning the Avaya booth, but I'll definitely stop by there while the Expo is open.

I plan to blog parts of the conference, though I don't whether I'll try to do it live or via tape delay.

Posted by Robert at 05:09 PM | link | comments (0) | trackback (0)
Super Automatics

How It Works: Be Your Own Barista, With a Programmable Helper - [Free registration required]

One of my favorite electronic devices at home is my Capresso C2000, which my wife purchased for me several years ago from Whole Latte Love. Capresso was discontinuing the C2000 model, so Whole Latte Love had a closeout sale on it. Though $750 may seem like a lot for a coffee machine, it was originally priced closer to $1,500. I've used it to make on average two strong, absolutely delicious cups of coffee nearly every day since it showed up on my doorstep in a box the size of a typical San Francisco apartment.

I still buy an occasional cappuccino or cafe latte from Peet's or from a small coffee house when I'm far from home and in need of a fix, but I've cut way back on my coffee drink budget. Of course, the Fair Trade certified, organic, Sumatran coffee beans I buy aren't exactly cheap, so I have no idea if I have saved any money. But having access to great coffee in roughly a minute after rolling out of bed is totally worth it.

Posted by Robert at 12:12 PM | link | comments (0) | trackback (0)
Success BBQ

I'm sure Success BBQ sounded like a good name for a restaurant when they opened the place. The barbed wire fence now surrounding the entire building suggests otherwise.

Success BBQ

I took this photo with the camera on my Treo 600 while my wife was driving down the street. I've gotten a bit better at extracting acceptable photos from the camera. After adjusting the contrast and cropping the photo a bit, I scaled it down to about 400 pixels wide. The images start at 640x480. Scaling the photos from the Treo camera down in size using a good image manipulation program (like the GIMP) is almost always a good idea.

Posted by Robert at 12:48 AM | link | comments (0) | trackback (0)
March 17, 2004
Recommending VXML and SRGS

World Wide Web Consortium Issues VoiceXML 2.0 as a W3C Recommendation

The W3C advanced VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS) from Candidate Recommendation status to Recommendation status. Although vendors have been delivering products that implement the VXML 2.0 and SRGS specs for several years now, it's good for the specifications to reach the final stage of approval from the W3C. Hopefully, we will now see quicker progress on CCXML, VXML 2.1, and promotion of SSML to Recommendation status.

Posted by Robert at 07:35 AM | link | comments (0) | trackback (0)
March 16, 2004
Rumsy on the Hot Seat

stevenberlinjohnson.com: Rumsfeld Faces The Nation, And Stammers

Donald Rumsfield is learning that if you make enough really bad decisions and then later lie about them in public to cover them up, the lies eventually come back to haunt you. During an interview on "Face the Nation" Rumsfeld choked out a denial that he or the President or anyone he knew in the administration had said Iraq posed an immediate threat before we invaded the country.

SCHIEFFER: Well, let me just ask you this. If they did not have these weapons of mass destruction, though, granted all of that is true, why then did they pose an immediate threat to us, to this country?
Sec. RUMSFELD: Well, you're the--you and a few other critics are the only people I've heard use the phrase "immediate threat." I didn't. The president didn't. And it's become kind of folklore that that's--that's what's happened. The president went...

Unfortunately for Rumsfeld, Thomas Friedman had a quote from Rumsfeld catching him in a lie.

Mr. FRIEDMAN: We have one here. It says "some have argued that the nu"--this is you speaking--"that the nuclear threat from Iraq is not imminent, that Saddam is at least five to seven years away from having nuclear weapons. I would not be so certain."

and continuing with the quote from Rumsfeld...

Mr. FRIEDMAN: "No terrorist state poses a greater or more immediate threat to the security of our people and the stability of the world and the regime of Saddam Hussein in Iraq."

Rumsfeld stammered out a final defense in a manner that made even Bush sound eloquent.

Sec. RUMSFELD: Mm-hmm. It--my view of--of the situation was that he--he had--we--we believe, the best intelligence that we had and other countries had and that--that we believed and we still do not know--we will know.

Okay, that's fine. It's well known now that the administration relied on what turned out to be bad intelligence to justify invading Iraq. Maybe you really did believe it at the time. It appears you definitely wanted to believe it. But if you screwed up, just admit it. Don't keep lying about it. If you won't listen to me, go ask Martha about what happens when you lie about bad decisions.

Rumsfeld then mentioned David Kay's statement that we are only about 85% of the way through our search for WMDs. Ooooh, bad move, Donald. That invited Scheiffer to introduce a recent quote from Kay.

SCHIEFFER: 'The president should say, "We were simply mistaken and we're determined to find out why" and he said 'Until we say that, it's going to hurt American credibility and delay reforms in intelligence which simply need to be done.'

I fear Rumsfeld and his buddies are going to treat the search for WMDs like a search for space aliens. You can't prove that they don't exist. If you don't find them, that just means you haven't looked long or hard enough.

The CBS News website provides the full transcript of Rumsfeld's interview.

Posted by Robert at 10:51 PM | link | comments (0) | trackback (0)
March 10, 2004
Rhythmbox Reloaded

In a previous post, I described how I found xinf to be a better music player and library for Linux than Rhythmbox, a.k.a., Music Player. Rhythmbox was included on my Fedora install CDs, so I thought I would give it another chance. The short summary is that it is now my primary music player and library, though I will probably also install xinf.

CPU usage, whether due to a newer version of Rhythmbox or due to upgrading to Fedora, now never exceeds 3%, and is usually much lower. Also, Rhythmbox starts very quickly and I've experienced no audio dropouts.

The biggest problem I had before with Rhythmbox was that I could not play back my 192 kbps MP3s. I'm pretty sure it would play 128 kbps MP3s, but maybe I was mistaken. Unfortunately, the error message was too cryptic for me to easily figure out what was wrong. In the current version of Rhythmbox, the error message told me what I needed to know.

Failed to create mad element; check your installation

I needed the mad plugin for GStreamer. Mad is an MPEG audio decoder. If you are using yum as your package manager, the plugin is easy to find. First, make sure you have added rpm.livna.org as a repository. Then, as root, run:

#yum install gstreamer-plugins-mp3

If you want to see what else is available for GStreamer, try:

#yum info gstreamer*

The only major negative I have run into with Rhythmbox is playlist support. For some reason, I can't drag artists or albums to a playlist. It only works with individual songs. The documentation suggests this should work, but there are a couple other places (e.g., an Organize menu) where the documentation doesn't match up.

One great sign for the future of Rhythmbox is that lots of new development work is ongoing and there are a lot of contributors.

Posted by Robert at 08:28 PM | link | comments (0) | trackback (0)
March 08, 2004
CDMAPAL

I received my check for $13.86 a couple days ago as part of the settlement of the Compact Disc Minimum Advertised Price Antitrust Litigation. I thought about using it to buy some blank CD-Rs, but I discovered a better option. I donated my check to the EFF.

Donate My Music Check site banner

Okay, so I'm going to cash the actual check, but I'm donating tonight an equal amount to the EFF via DonateMyMusicCheck.com, a site graciously set up and run by Marc Freedman. If you're willing to PayPal the donation, 100% of it goes to the EFF if you have a balance on your PayPal account or if it draws on a checking account.

As member #323 of the EFF (and I've got the original membership card to prove it), I felt obligated to contribute my check. If you haven't joined the EFF, please consider doing so. The EFF is a powerful organization helping to protect our digital rights. Plus, they participate in cool events like Digital Mix.

Posted by Robert at 12:16 AM | link | comments (0) | trackback (0)
March 07, 2004
Stiction?

As I reported last October, when I returned from a 10-day trip and turned my PC on, the main hard drive just made clicking sounds and Windows XP failed to boot. Oddly enough, if I left the machine on long enough and attempted to reboot it enough times, the Linux GRUB boot loader on the first drive would eventually load. I could then boot into Linuxon the second drive. Even better, I was able to read the NTFS partition on the first drive from Linux. However, Windows still would not boot.

For the record, the problematic drive is an 80 GB Seagate ST380021A that came with my Dell Dimension 4400. I think the Dell is still under warranty, so I may see if I can get them to replace the drive.

While I was away last week, my wife told me on the phone that the drive (which usually behaved itself once Linux booted) had started clicking. Rather than torment her with the sound, I told her to shutdown the system. When I got back a few days later and booted the machine, the drive would not even load GRUB.

On the assumption that the problem with the drive was stiction (a.k.a., static friction), I tried to break it loose by rotating the drive with a quick wrist snap while it was running and making the clicking sounds. I also tried thumping the drive on all four sides, but no luck. If the problem is stiction, it will require a pretty strong force to overcome it.

Since I was already down to the last GB or two of free space on the second drive, I decided to go to Fry's Saturday and I picked up a 200 GB Hitachi Deskstar ($195 - a $90 mail-in rebate). Is there a problem that can't be solved by going on a shopping spree at Fry's? I didn't think so.

Since I had already aquired the Fedora Core 1 CDs with a copy of Linux User & Developer magazine that I bought in Biloxi last week, I decided to go with a fresh Fedora install on the new drive, rather than upgrading my RH 9 install. Since the RH9 install was the first Linux install I had tried to heavily customize, I had a few learning experiences that I wasn't sure how to undo. Problem solved. More to come on the Fedora install, which is mostly going very, very well.

Posted by Robert at 10:09 PM | link | comments (0) | trackback (0)
March 06, 2004
Mojo Nixon Retires

Mojo Retiring

Back when I was the GM at KTRU, Mojo Nixon came into the station for an interview. After the interview, Ray and I took him into the production studio to record a station ID. We rolled tape and after a couple seconds of him making all kinds of wild yelling sounds (if you've ever seen him perform, you know exactly what I'm talking about), he said something like, "This is Mojo Nixon on KTRU, F-Word Radio." We asked him where he came up with that phrase. He said it just popped out of his head spontaneously while he was babbling. Again, if you've seen him perform, I don't think you would be surprised.

I decided to make "F-Word Radio" our slogan. It was pretty sweet to see it in the Arbitron ratings:

...
KSBJ - Something Better Jesus
KTRU - F-Word Radio
...

When Ray and I saw Mojo performing at a show the next year, I was wearing my F-Word Radio tour shirt that we had printed up for our trip to the New Music Seminar. Mojo liked it so much he tried to pull it over my head and take it away.

And that was my brush with mojolebrity.

Posted by Robert at 12:51 AM | link | comments (0) | trackback (0)
Archives
June 2004 May 2004 April 2004 March 2004 February 2004 January 2004 December 2003 November 2003 October 2003 September 2003 August 2003 July 2003 June 2003 May 2003 April 2003 March 2003 February 2003 January 2003 December 2002 November 2002 October 2002 September 2002 August 2002 July 2002 June 2002 May 2002
Recent Entries
Kai-Fu Lee Keynote at SpeechTEK SoccerPhone for 2004 Bill Gates Keynote at SpeechTEK VoiceXML 2.1 Draft SpeechTEK Super Automatics
Stories
The Carefree VII Incinerating Toilet, from Incinolet To Catch a Falling Knife
Categories
Arts and Education
Blogging and RSS
Entertainment
Everything Else
Food and Drink
Intellectual Property
Linux
Mac
Music
PhoneBlogger
Privacy and Security
Reviews
Soccer
SoccerPhone
Software
Speech
Tahoe Cabin
The Unusual and the Weird
Treo 600
VoiceXML
VoIP
Repartee welcome at: Mail To