Read the latest from the Web Foundation

News and Blogs

Teaching African Entrepreneurs to Develop Voice-Browable Applications

Web Foundation · August 16, 2011

Kenyan students and Deborah Dahl - mLab East Africa
Kenyan students and Deborah Dahl - mLab East Africa

Guest blog post by Deborah Dahl *

I just finished teaching short courses on developing voice applications in Kenya and Ghana. This was a great opportunity for me to see entrepreneurship in the developing world with a technology that I’ve been working with for many years. Both of the classes were full of enthusiastic, creative, and motivated students.Voice applications have a huge potential in the developing world, and the students were quick to realize this. Voice applications require only an ordinary telephone, using speech to talk to the users and speech or touchtones to collect the users’ responses. These devices are cheap and ubiquitous, and, unlike smartphones, don’t require developers to know anything about the user’s device, and don’t require users to install anything.

We started out the classes by brainstorming about application ideas that fit voice technology. We came up with a lot of good ideas – for example, appointment reminders, health information, emergency notifications like severe weather, traffic reports (this could be very useful in congested cities like Accra and Nairobi!), and even telling a bedtime story.

The class in Ghana was too short to do much lab work, but in Kenya we actually developed the basics of an appointment reminder application. The application calls patients with an upcoming appointment and checks to see if they’ll be able to make the appointment.  This is a very simple application, but it could be valuable in a country where transportation is difficult, where there aren’t many doctors, or where missing a medical appointment could have serious health consequences. Different teams of students designed and developed different parts of the application, with the whole class participating in reviewing each other’s work. That way everyone had a chance to both design a voice application and to review designs.

The primary technology for voice applications is called a voice browser, analogous to a web browser for graphical applications. Figure 2 compares a voice-based interface to the web to a traditional graphical interface.

Comparing voice access to the web to access with a traditional browser. The boxed area shows voice components.
Comparing voice access to the web to access with a traditional browser. The boxed area shows voice components.

Fortunately, there is a well-established standard, VoiceXML, that’s universally supported by voice browser platforms, and we spent quite a bit of time on coding in VoiceXML. Voxeo, one of the major vendors of voice browsers, provides a free developer’s version of their very good Prophecy VoiceXML platform, which the students were able to use to test their work. Getting into the VoiceXML mindset requires a shift in thinking if a developer is used to working with graphical applications, though. For example, since voice applications mimic a conversation, time management is extremely important. However, I think the students caught onto the paradigm pretty quickly.

We also spent quite a bit of time on voice user interface design. I think the students may have enjoyed this less than the more technical aspects of VoiceXML coding, but they realized the importance of user interface design for applications that don’t use a display, keyboard, or mouse. In particular, a key component of voice user interface is the callflow, or sequence of voice interactions that makes up a voice application. Figure 3 shows a typical voice callflow, for an application that provides sports scores.

The example shows a call flow for a sports scores service
The example shows a call flow for a sports scores service



While voice applications are just getting started in Africa, they have given rise to a large industry in the U.S. and Europe over the last ten years. There are many hosting options available, and if the applications use speech recognition or text to speech, there are a number of choices available. This is not the case in Africa. Although English is widely spoken, speech recognizers are more specific than this –that is, recognizers are developed for individual dialects like American, British, or Indian English, which are different from the versions of English spoken in Africa. Although there is ongoing research to make it easier to develop recognizers for new languages and dialects, touchtone for input and recorded voices for output work very well to support many types of useful applications.

The other important infrastructure component needed in Africa is a choice of hosting platforms that provide local phone numbers. It is possible for voice entrepreneurs to develop their own interfaces to the telephone system, but in practice, the telephony knowledge required for this involves a steep learning curve. On the other hand, outbound calls, where the application calls the user, are feasible without a local host, because the user isn’t paying for a phone call. Some of the hosting providers currently located in Europe and the U.S. are starting to see the potential of the developing world, and it may not be too long before VoiceXML hosting is more available in Africa.




* Deborah Dahl is the Principal at Conversational Technologies, a consulting company in the area of speech and language technologies and applications. She has over twenty-five years of experience in the speech and natural language processing industry. Her primary research interest is in multimodal spoken dialog systems, in particular as applied to assistive technology for individuals with disabilities as well as for the elderly. Deborah has recently collaborated with the Web Foundation for Mobile Entrepreneurs in Ghana and mLab East Africa.

Dr. Dahl has published over fifty technical papers and is the editor of the book Practical Spoken Dialog Systems. In addition to her technical work, Dr. Dahl is active in speech and multimodal standards. She is the Chair of the World Wide Web Consortium’s Multimodal Interaction Working Group, serves as one of the editors of the EMMA (Extensible MultiModal Annotation) specification and is also a member of the Voice Browser Working Group, and the HTML 5 Working Group. She has served as co-organizer of numerous W3C workshops, is a member of the Board of Directors of AVIOS (Applied Voice Input/Output Society), a major organization for speech professionals, and has served as a reviewer for many journals and conferences in the speech and language field.

Your comment has been sent successfully.