PhoneBlogger


3/20/2011: 3:06 pm: Google App Engine, PhoneBlogger, Python, VoiceXML

In late 2002, I thought it would be cool to build an application that allowed you to blog by phone. Tools, libraries and hosted services were a bit more limited back then, but after a few months of learning, coding and debugging, I managed to release the first version of PhoneBlogger in January 2003. Along the way, I learned a lot about Python, VoiceXML, JavaScript, XML-RPC, audio encoding, shared web hosting and command line tools for Linux.

Fast forward nearly ten years and not only have the tools and libraries come a long way, but there are many more free or inexpensive hosted services that simplify building a tool/service like PhoneBlogger. Instead of hosting the application code on a shared hosting site, I can now build and deploy on Google App Engine. Though scalability is not an issue for my personal use of PhoneBlogger, if it were turned into a public service, App Engine would make scaling much simpler and more economical. App Engine also makes deployment a snap, though with a small amount of work, so would Fabric. For my PhoneBlogger rewrite, I decided to use App Engine.

In the original version of PhoneBlogger, I coded a bunch of static VoiceXML and JavaScript for managing the telephone interaction with a caller. At the time, three of the most prominent services for VoiceXML developers were Tellme (now owned by Microsoft), BeVocal (now owned by Nuance) and Voxeo (still independent). I had to write slightly different code for Tellme and BeVocal, but the differences weren’t that significant. I think it would have been pretty simple to port to Voxeo, as well. Improved support of VoiceXML 2 would now likely allow me to use the same code on each platform.

While VoiceXML is still a great option for building speech apps, a couple of new services bring you simple APIs for building speech or DTMF (touchtone) applications, at the cost of portability. This time around I’ve started with Twilio. I very quickly turned a Python/GAE example from the Twilio website into a DTMF app for tweeting by phone. Although speech recognition allows you to build much more complex and natural applications, many simple applications can be built quickly and easily with just support for pressing keys to provide input. PhoneBlogger falls into that category, for now.

One very convenient thing about Twilio is that I can use their platform to capture and host recordings in a format that is simple to play back in a web browser. If I were really concerned about longevity of the recordings I could easily retrieve them and store them elsewhere, but I’m okay with keeping them on Twilio servers for now. That’s an easy enhancement to add later. The biggest downside for tweeting the Twilio links is that the Twilio recording URLs are ginormous. Fortunately, the goo.gl URL shortener made quick work of that problem.

I’m also going to take a look at porting my code to Tropo, which is a service offered by Voxeo. Tropo is built on Voxeo’s Prophecy platform and offers speech recognition as an option.

I decided to begin the rewrite by first supporting tweeting by phone. Twitter offers a great API, which is made even simpler by libraries like Tweepy. I highly recommend first checking out the OAuth support in any library for Twitter you might consider using. OAuth can be a complex beast, but libraries like Tweepy make it almost trivial.

The original PhoneBlogger source code and a couple iterations of it are available on SourceForge. I wasn’t particularly interested in learning about CVS at the time, so I just uploaded tarballs of all the code. While SourceForge has improved a lot, I’ve become more of a fan of GitHub. Google Code, LaunchPad and BitBucket are also great options. I started using LaunchPad when working on a Java library for Gearman, but then set up a couple of repos on GitHub when I started working on Log4mongo-Java. I’m much happier with Git, Bazaar and Mercurial than Subversion and CVS (Caveman Versioning System). I’ve already started posting code for the new phoneblogger project on GitHub.

As of now, the new version of PhoneBlogger supports tweeting by phone. All the code is on GitHub, along with a README file with the basic steps to set it up for yourself. In an upcoming blog post I’ll walk through those steps in a little more detail.

3/26/2008: 11:31 am: Blogging and RSS, PhoneBlogger, Speech

I just read about a new site called Utterz that goes far beyond what I did with PhoneBlogger a little over four years ago. A couple of other voice blogging by phone sites have cropped up in the intervening years, but their feature sets were generally the same or only slightly better than what I had built into PhoneBlogger. Their main advantage was hosting the service for you, albeit at a cost to you. I wasn’t surprised when I read about most of them closing up shop.

At Utterz, however, voice blogging by phone is just one of the services they offer. They even offer dial-in numbers outside the US. And even better, it’s free.

One thing I’m concerned about from reading the FAQ is that it looks like they just use caller ID to determine who you are when you call in. Caller ID is easy to spoof, so it would be simple for me to post a voice recording to someone else’s Utter page. If they had configured connections to their blog, my post would even show up there.

Another downside is that it appears that the voice app is just a DTMF app. That really limits what they can do. For example, one of the features I keep planning to add to PhoneBlogger is to let you tag or categorize the voice post. With a speech app that would be very easy to do. Some of the blogging APIs let you retrieve a list of all the existing categories. It’s trivial to those into a grammar and prompt the caller to say one of them. Good luck specifying a category with a DTMF app (press 1 for LOLcats, press 2 for flying spaghetti monster, …, press 19 for fishsticks are go, …). Actually, you could do better by having the caller enter the first two digits from the phone keypad that map to the first two letters of the desired category. Unless the caller has a lot of popular categories starting with the same letter, the app would then have to present at worst a short disambiguation list. Still, a speech app would be much better, especially if you want to support adding multiple tags to a post.

11/6/2005: 8:51 pm: Blogging and RSS, PhoneBlogger, Software, Speech, VoiceXML

I haven’t posted about PhoneBlogger in quite a while, but I’m thinking about updating and enhancing some of the code. A lot has happened in the audio/phone blogging world since I announced PhoneBlogger January 9, 2003, and posted the PhoneBlogger source code on SourceForge.

One new buzzword is mobcasting. The Wikipedia page on mobcasting quotes Andy Carvin as writing:

A quick example: imagine a large protest at a political convention. During the protest, police overstep their authority and begin abusing protesters, sometimes brutally. A few journalists are covering the event, but not live. For the protestors and civil rights activists caught in the mêlée, the police abuses clearly need to be documented and publicized as quickly as possible.

This is quite similar to the scenario I was thinking of nearly three years ago when I announced PhoneBlogger:

A journalist could use it from a payphone (good luck finding one, though) or with a basic cellphone to immediately publish to the web from the scene of an unexpected event in progress. It’s moblogging for the people, man.

Note the quaint reference to a payphone. My point was that you wouldn’t need a fancy phone. Of course, mobile phones have come a long way since I wrote that. Carvin’s example also includes the use of camera/videophones, rather then just audio.

My favorite part of the Wikipedia article, though, is near the end where it says:

Carvin is now exploring the creation of an open-source mobcasting tool that could be installed on a server to allow for community mobcasts via a local telephone call.

I’ve been thinking about the same thing, too. While it makes life simpler for me to host the application with a VoiceXML hosting provider like BeVocal, I do like the idea of having a more self-contained app. It’s going to be pretty complicated, though, to sort out everything I need with a free PBX like Asterisk or sipX, a free VoiceXML browser like OpenVXI, a free ASR engine like Sphinx, and a free TTS engine like Festival. Dealing with PSTN calls will also be a hassle. If I implemented this, I would probably just deal with SIP. That led me down the path of looking into building or finding a SIP softphone that could run on a mobile phone. There is a Java API, JAIN-SIP, for building a Java SIP user agent. The phone would ned only a J2ME runtime. What with all these acronyms and integration efforts, I think you can guess why I haven’t taken all of this on by myself, yet.

I’m glad to see that people like Andy are doing really interesting things with audio blogging. I built PhoneBlogger solely because I thought it would be fun to build. I never really ended up using it.

4/27/2004: 9:48 pm: PhoneBlogger

There’s a new audioblogging service in town, although AudioBlog.com is currently only in beta. So far, the service supports MovableType, TypePad, Blogger, and LiveJournal blogs. One cool aspect of AudioBlog.com is that in addition to audioblogging by phone, you can audioblog from any computer with a microphone and a Flash-enabled web browser. Also, it looks like Eric set things up so the audio recordings are set up for efficient streaming.

[via Audio/Mobile Blogging News]

4/18/2004: 12:21 pm: Blogging and RSS, PhoneBlogger

So, for any of you stopping by my blog because Dave Winer linked to it on scripting.com, just a heads up that PhoneBlogger doesn’t yet support Radio. However, if anyone is interested in adding support for Radio, I would appreciate the help. PhoneBlogger is already modularized to support the differences between Movable Type and Blogger, so I hope it will be easy to add support for Radio. The Movable Type specific code uses the metaWeblog API, so, for all I know, support for a Radio blog may just be a matter of entering the right values in the XML config file.

11/16/2003: 11:06 pm: Blogging and RSS, PhoneBlogger

I just made the PhoneBlogger 0.2 release available from the SourceForge project site. The biggest highlight of the new release is that you can now run PhoneBlogger on a different server than where your weblog is hosted. This means that someone can now run PhoneBlogger as a hosted web service for multiple bloggers.

PhoneBlogger has two main components

  • Static VoiceXML, JavaScript, grammar, and XML configuration files
  • Python CGI scripts

In the 0.1 release, you could host the static files on a server other than where your weblog was hosted. The VXML files access the configuration info for everything else from a local XML file. This XML configuration file can contain info on as many blogs as you want. The configuration file does not include usernames or passwords. That info is collected during each phone call.

Therefore, one person could configure PB to post to more than one of their blogs. Also, you could configure PB to let more than one person post to more than one blog. A limitation of the 0.1 release, though, was that the Python CGI scripts had to be on the same server as the weblogs. I used an operating system file copy command to place the incoming WAV file directly into a sub-directory of the weblog before converting it to an MP3.

In the new release, I added support for the newMediaObject XML-RPC call. That allows PB to upload the recorded audio file to your weblog over HTTP. If you do host your weblog on the same server as the CGI scripts, though, you will want to configure PB to continue to use the file copy command for performance reasons.

So, the flow of a phone call to PhoneBlogger now goes something like this

  • The blogger/caller calls a phone number at a hosted VoiceXML Server provider
  • The VoiceXML Server looks up the URL of the VoiceXML application and begins to run it
  • The VoiceXML app prompts the caller for the blog name, their user name, and their password
  • The VoiceXML app then records the audio the blogger wants to blog
  • The VoiceXML app uses an HTTP GET to send the recorded audio to a Python CGI script
  • The Python CGI script converts the audio to an MP3 file and returns a path identifier for the file
  • The Voice XML app informs the blogger/caller of this success and then uses another HTTP GET to send the blog name, the username, the password, and the path identifier of the MP3 file to another Python CGI script
  • The second Python CGI script makes an XML-RPC call to the weblog to upload the MP3 file
  • The Python CGI script then uses a different XML-RPC call to post a new entry to the blog that includes a link to the MP3 file

The communication paths and protocols are:

InitiatorRecipientProtocol
PersonVoiceXML apptelephone call (could be PSTN, VoIP, or SIP-based)
VoiceXML appPython CGI ScriptsHTTP
Python CGI ScriptsWeblogXML-RPC over HTTP

Finally, I’ve decided to release PhoneBlogger 0.2 under the Apache Software License, as well as the GPL. Pick the license you like best.

: 4:37 pm: PhoneBlogger

This post was created with PhoneBlogger. Click to listen to the recorded message.

This is the final test run (hopefully) of PhoneBlogger before the 0.2 release.

11/12/2003: 9:13 pm: Arts and Education, PhoneBlogger

I tried to use PhoneBlogger at tonight’s SRL performance, but the V1 rocket engine was just too loud. Huh, what a surprise. When I wrote PB, I never did take into account the requirement of being able to phone blog the firing of a rocket engine.

Oh, well. I’ll put up some photos and videos from the show soon. It was most excellent.

9/3/2003: 11:53 pm: Blogging and RSS, PhoneBlogger

The Sacramento Bee ran an article this week by Rachel Leibrock on moblogging. She interviewed me by email for the story.

“Moblogging is still in a very early growth stage, mainly due to a general lack of awareness of the power of technology and a lack of access to the necessary tools,” explains Oakland-based software architect Robert Stewart.

Unfortunately, the “this” from “power of this technology” in my emailed answer to a question somehow got cut out on the editing room floor. I think it makes it sound like I believe that people aren’t yet moblogging in large numbers because they are oblivious to technology, in general. Of course, I was referring to just the specific technology surrounding moblogging. Honestly, I don’t think people are generally clueless, just mostly.

8/6/2003: 4:01 pm: PhoneBlogger

m-pulse / a cooltown magazine / Blogging Goes Mobile

Amy Cowen wrote a great article on moblogging for the August 2003 issue of mpulse, the HP Labs cooltown magazine. I remember reading about cooltown a few years ago when HP first announced the initiative. As the economy turned down and out, I had assumed that funding for cooltown would’ve been one of the first things HP would cut. I’m glad to see they have maintained what appears to be a decent level of funding for interesting research in mobile computing.

“For several years, HP Labs has been working at the intersection of nomadicity, appliances, networking, and the web. We called our vision of the future cooltown – a vision of a technology future where people, places, and things are first class citizens of the connected world, wired and wireless…”

The article provides a very nice overview of moblogging and of some of the available tools. It gets fuzzy only when she delves into APIs and syndication formats, first referring to “XML-RPC and RCC” and then implying that the “new specification codenamed Echo” would not use “XML and RSS underpinnings for blogs”. Echo might not be based directly on RSS, but it will definitely be XML-based. I’ll cut her some slack, though, since the rest of the article is quite good and trying to summarize the current API and syndication format drama in a single paragraph is a challenge that I wouldn’t want to take on.

In the section on the moblogging community, Amy wrote a bit about my PhoneBlogger tool.

“For example, the developer of PhoneBlogger, an opensource VoiceXML project that allows users to post voice entries to their blogs, blogged the following after learning about LISTENLAB’s audblog: ‘if you are willing to get your hands filthy with electrons, want total control over the blogging tool, and have plenty of free time to spare, let me know and I will help set you up.’ This kind of techno, hands-on approach is a hallmark of the developer community, and its extension into blogging circles gives the blogging community an almost grassroots edge.”

I’m starting to feel kind of bad about doing virtually no development on PhoneBlogger since I released the source code many months ago. I think this is the motivation I needed to get back into the code, add some new features, and simplify the install. Hey, I’ve got to live up to my newfound status of behaving like “a hallmark of the developer community”, as opposed to my usual behavior as a hallmark card for the slacker community.

Next Page »


Fork me on GitHub