Transcribing Audio Blogs

By | March 6, 2003

In brief: 27 Feb 2003 [dive into mark]

While I’m going to try to add text transcription to PhoneBlogger, I don’t how successful it will be.

In one of the items in his Feb 27 blog entry, Mark suggests that when you are audio/phone blogging, “any audio content needs to be supplemented with a simultaneous text transcript.”

The problem with this in most cases is that speaker independent, natural language speech recognition is just not up to the challenge, yet. While you can buy dictation software that does a reasonably good job of creating transcripts, the really good stuff is too expensive and the training period is longer than most people are willing to endure, especially most of the casual audio blogging customers of a service like Audblog.

The type of speech recognition most people are familiar with is called directed dialog. This means that only a restricted grammar is available. For example, United Airline’s service at 1-800-824-6200 works well because it is listening only for words related to an airplane flight, like “arrival” or “departure”. Both PhoneBlogger and SoccerPhone use directed dialogs.

On top of this, the usable voice frequency for your regular PSTN-based phone call is about 300 Hz to 3400 Hz. Interestingly, while most vowel sounds are strongest below 3 kHz (including fundamentals and harmonics), consonants are usually more concentrated above 3 kHz. Since quite a bit of information in your voice signal can range up to around 5 kHz or so, especially for a child, the limited bandwidth available makes recognition over a phone call that much harder. And don’t even get me started on VoIP over the Internet.

4 thoughts on “Transcribing Audio Blogs

  1. Dan Z.

    I don’t think automatic transcription will work very well, for all the reasons you mentioned and more. (And, personally, I think inaccurate, garbled transcription is worse than no transcription.) But human-based transcription could solve the problem. Here’s a rough idea of how it might work:

    Imagine a peer-to-peer network of audioblog transcriptionists. You join the network and volunteer as a transcriptionist because you like to audioblog, too. You know that by transcribing entries for others, you earn transcriptions for yourself.

    Every so often, you get an instant message with an audioblog entry attached, or a link to an audioblog entry. You transcribe the entry, e-mail it to a given address (along with a code in the subject line that identifies the blog entry being transcribed), and the transcribed entry is processed and posted on the appropriate blog.

    Later, the blogger who recorded the audio reviews your entry. If she thinks you did a good job transcribing it, she verifies it, and your whuffie is increased. You get more entries sent your way and you become a more trusted transcriptionist. Faster transcriptions also receive more whuffie. Trusted transcriptionists could eventually verify transcriptions from others before they were posted.

    Ideally, you’d be able to limit your potential transcriptionists to specific, trusted people, like your friends. That way, a group of five or six friends could cover transcription duties for each other.

    I’d participate in a system like this.

    Reply
  2. Robert

    I definitely agree with you on an inaccurate transcription being worse than none. The mocking in Doonesbury of the handwriting recognition capability, or lack of capability, of the early versions of the Apple Newton comes to mind.

    I like your idea for audioblog transcriptions. If you’re using one of the tools that records the audio part of the blog entry using a mic on your computer, you might as well transcribe it yourself. After all, you better than anyone should know what you said. But, if you are audio blogging remotely, then using a small, trusted group of other people greatly increases the chance that the audio will be transcribed more quickly.

    Of course, it might also just be more fun to trade off transcription work with others. Transcribing your own audio would probably get boring after a while.

    Reply
  3. Bill Kearney

    As a former Newton developer I can tell you recog is a tough nut to crack. Collective transcribing seems like a lost cause. It sounds cool until you realize nobody’s going to bother doing it. And to guard against people transcribing stuff deliberately in error means you’ll have to scale up a whole trust network and validation mechanism.

    Reply
  4. Hired Hand Transcription

    Deliver quality documents in a timely manner at a competitive price so that Hired Hand Transcription will be your first choice for all your transcription needs.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.