Building Applications Frequently Asked Questions

This FAQ contains answers to questions about building VoiceXML applications using Tellme Studio.

  1. What's a voice application?
  2. IVR has been around for years. What's so special about these new voice applications?
  3. What technologies are used to build voice applications?
  4. Is VoiceXML powerful enough to write full-fledged voice applications?
Q: What's a voice application?
A: A voice application is a specific type of "phone application". A phone application is an application that interacts with a person over the telephone in an automated fashion. A bank's automated system that answers a call and asks the caller to "press one for your account balance, two to transfer funds between accounts, or three to speak to a customer service representative" is a perfect example of a phone application. Phone applications using the telephone keypad for input, referred to in the industry as Interactive Voice Response (IVR) applications, have been commonplace for many years, especially among large companies such as banks, insurance companies, and airlines.

Recent advances in speech recognition technology have allowed the creation of a new type of application where the user interacts with the application by speaking to it rather than entering information through the telephone keypad. This type of application is called a "voice application".

Q: IVR has been around for years. What's so special about these new voice applications?
A: First, they provide a natural interface for human-computer interaction over the phone. Callers find speaking into the telephone more intuitive than pressing keys on a telephone keypad.

Second, they transcend the limitations of the keypad-based interface. The expressiveness of a voice-driven interface enables much more complex interactions than those supported with the keypad only. For example, selecting one item from a large list of items, such as a particular stock whose quote you'd like to hear, is difficult and awkward when using only the keypad. Though schemes exist that allow the "spelling" of the stock symbol through the keypad, none are as intuitive as just being able to speak the name of the stock.

Third, they implement a hands-free interface perfect for mobile users. IVR applications requiring button presses pull callers' attention away from other activities. Voice applications allow callers to focus on multiple things at once. This is especially important if the caller is driving or juggling luggage while running through an airport.

Q: What technologies are used to build voice applications?
A: There are three layers of technology required to implement a voice application: the telephony layer, the voice platform layer and the integration layer.

The telephony layer answers incoming calls, performs call management, and connects the caller with a running instance of an application. This involves the installation and management of carrier connections, switches, call distributors, and the software necessary to keep them up and running.

The voice platform layer provides the environment in which the voice application is run. It is responsible for providing the following functionality:
  • Speech recognition. Interprets callers' spoken input.
  • Streaming audio. Plays audio files for prompting callers and providing information.
  • Text-to-speech. Automatically generates speech when pre-recorded audio isn't available.
  • Voice application interpreter. Coordinates playing of prompts, invocation of the speech recognizer, and implementing application logic according to callers' responses.
The integration layer links the voice application with computing infrastructure external to the application. This includes resources such as databases, call-center management systems, transaction processing systems, and legacy applications. The specific technologies to do this vary based the systems to be integrated.

Q: Is VoiceXML powerful enough to write full-fledged voice applications?
A: Yes, absolutely! Every one of the applications running on 1-800-555-TELL was written on the Tellme Platform using nothing but VoiceXML and JavaScript. They represent the most extensive set of voice applications ever built.

Another way to think about it is to consider that a voice application is nothing but a Web application with a speech-driven interface. Just like Web applications, voice applications can perform hard-core data processing, integration of disparate data sources and legacy systems, and complex user interactions.

Q: What type of language is VoiceXML?
A: VoiceXML is a declarative, XML-based language comprised of elements that describe the human-machine interaction provided by a voice response system. This includes:
  • Output of audio files and synthesized speech (text-to-speech).
  • Recognition of spoken and DTMF input.
  • Control of telephony features such as call transfer and disconnect.
  • Direction of the call flow based on user input
VoiceXML may also embed meta-information, references to other VoiceXML files, and JavaScript code, used to implement client-side logic.

Q: How do you handle speakers with foreign accents?
A: Tellme's work on 1-800-555-TELL , which naturally is designed to be usable by the widest range of speakers, has shown that the Microsoft speech recognition engine employed by the Tellme Platform does a very good job with callers with strong foreign accents.

[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers