As seen recently on Slashdot, MGM recently settled a class action suit involving the image quality of DVDs they sold under the classification of “widescreen”. The claim is that the image width of the alleged widescreen versions was no different than for the standard versions. As per normal class action settlement verbiage, MGM denies any wrongdoing.
The website has a link to a five-page PDF containing a long list of the movies in question. If you purchased any of these movies between December 1998 and 8 September 2003, you can trade in each one for either $7.10 or a new DVD from a list of 325 titles. You have to submit a request for a claim form before you get to see the list of 325 titles.
I very quickly scanned the list and found at least four titles that I had purchased or received as a gift potentially during that time period – 24 Hour Party People, Powaqqatsi, Koyaanisqatsi, and This Is Spinal Tap. So, I called the toll free number to obtain a claim form. As I was calling, of course, I was thinking about how an automated speech application would be a perfect solution for collection the information for the claim request.
I wasn’t surprised to discover that the “claims administrator”, The Garden City Group, had come to the same conclusion. They have a call center in Sarasota, Florida, and have an IVR system or systems that they claim can handle 380 simultaneous calls. The website didn’t distinguish between how many were DTMF-only ports and how many were speech-enabled ports.
The app wasn’t that great, but they might not have that much experience building speech apps. Unless you’ve had the chance to build a lot of speech apps, it’s hard to develop the expertise required to design a highly conversational app. While there are a lot of good, experienced website designers available, there aren’t very many good, experienced speech application designers.
Back to the app. The good news is there was no need for text to speech, because of the fairly static nature of the app. Of course, the reason it was static is that it didn’t do very much. Only one part of the dialog used speech recognition – the part where are you asked to name a DVD you purchased that you believe is covered by the specifics of the settlement. Since the female voice talent was able to record all the DVD titles in advance, her response included her saying the name of the movie back to you. Unless of course, the movie title wasn’t on the list. Their strategies for handling no matches and mismatches left a lot to be desired.
After the app either recognized a movie you said as being on the list or gave up after three or so no matches or misrecognitions, it proceeded to collect your name and address. Unfortunately, it did this by recording you while you said and spelled the requested information. Presumably, they then had a person transcribe the info. While this is cheaper than having live agents waiting to handle calls 24×7 (especially if they then outsource the transcription to a low wage country), it would be even cheaper if they used the speech recognition engine and a suitable interface to a database of names and addresses. We’ve developed a system like that at work, and it works great for automating the transcription. Our solution is built right into the app, so we can do the transcription in realtime and play it back to the caller for confirmation. If a live agent is confused by the recording of the address (whether due to accent, a poor cell phone connection, dogs barking, etc.), the caller is no longer on the line to ask for confirmation. Also, while automated speech recognition isn’t perfect, human speech reco and transcription isn’t exactly perfect either. Whether the agent misunderstood what I said or merely made a typo when entering the info, many times I’ve had my name or address transcribed incorrectly by a live call center agent.