Tellme Studio

Home MyStudio About Tellme
VoiceXML 2.x Essentials
Last Updated: 06-05-2009

The XML primer described XML, a generic markup language designed to describe any kind of data. Now, you'll learn about VoiceXML, a specific kind of XML designed to describe voice applications. Whereas XML is designed to represent arbitrary data, VoiceXML describes grammars, prompts, event handlers, and other data structures useful in describing voice interaction between a human and a computer.

This document is useful for beginning and intermediate VoiceXML developers. First, you'll be introduced to basic VoiceXML principles by creating a simple example application. Then, you'll drill into a number of important features of the VoiceXML language. Beginners should be able to get started using only the first section; the latter sections should prove useful in obtaining a deeper understanding of the VoiceXML language as you begin to develop more sophisticated voice applications.

A VoiceXML application consists of a set of VoiceXML documents, and each VoiceXML document contains one or more dialogs describing a specific interaction with the user. Dialogs may present the user with information or prompt the user to provide information, and when complete, they can redirect the flow of control to another dialog in that document, to a dialog in another document in the same application, or to a dialog in another application entirely.

At the root of every VoiceXML document is a root element, the vxml element. This element should contain one or more elements representing dialogs. VoiceXML 2.x provides two types of dialogs: form and menu. While menu is a convenient shorthand for certain situations, VoiceXML authors will spend most of their time writing form elements; thus, this introduction focuses on the form element.

The name form is no accident; VoiceXML's primary user interface paradigm is that of a form with a number of elements that either provide information to the user or request input from the user. For example, when you sign up for a credit card, you fill out a form that provides you with instructions for filling in its required fields. The VoiceXML interpreter does much of the same thing. This interface should be familiar to Web programmers and users, since it is also prevalent in the world of HTML.

Every form contains one or more form items, which are elements within a form that describe some kind of user interaction related to filling-in the form. The first form item that you'll examine is block. A block is a container for "executable content". That is, it contains commands (i.e. VoiceXML elements) that are executed sequentially. block elements are often used for presenting information to the user. For example, if you place text within a block, it's treated as a command to queue audio that will ultimately be played to the user. Consider the following VoiceXML document:

<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
  <form>
    <block>
      Hello, world!
      <exit/>
    </block>
  </form>
</vxml>

This document is the simplest functional VoiceXML document. It consists of a vxml element, which contains a single dialog (a form), which contains a single form item (a block), which contains a single statement indicating that the VoiceXML interpreter should queue the text "Hello, world!". The exit element instructs the interpreter to end execution of the application and return control to the 'interpreter context'. Before doing so, the interpreter will flush the prompt queue thereby rendering the text ('Hello, world') to the user via a Text-to-Speech (TTS) engine.

For a complete list of the elements allowed within a block, see the block element in the VoiceXML 2.x element reference.

Our example above played the audio "Hello, world!" using the TTS engine. However, users may want to improve the quality of their applications by recording real speech or other sounds. If you have an audio file that you would like to play to the user, you can use the audio element to do just that:

<audio src="ui/welcome.wav">Welcome to Tellme University!</audio>

This audio element plays the audio file located at the relative URL "ui/welcome.wav." Note that TTS text was specified inside the audio element as well. That way, if the Tellme platform encounters an error while fetching or attempting to play this audio file, it uses the TTS instead. If, however, the audio file plays successfully the TTS is not played. Tellme recommends that you supply TTS for all your audio.

As you have seen, block elements can be used to present information to the user. They can also be used to direct the flow of control within your application. One way to do so is to use a goto element, which tells the VoiceXML interpreter to stop what it's doing and to execute another dialog. The target dialog can be in the same document or in another document. Executing a goto element is analogous to clicking a link on a Web page; both elements redirects execution (or focus) to another location on the current page or cause the browser to fetch another document across the Web. The following snippet queues TTS and then redirects control to another VoiceXML document at the URL "document2.vxml."

<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
<form>
  <block>
    Goodbye, world!
    <goto next="document2.vxml"/>
  </block>
</form>

The following snippet queues some text and then navigates to another form in the same document. The target dialog is referenced by the value of its id attribute. Execution in the "infinity" dialog begins with the block which queues more TTS and then executes an exit. Before execution of the application ends, the interpreter renders the complete TTS string "To infinity and beyond."

<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
<form>
  <block>
    To infinity
    <goto next="#infinity"/>
  </block>
</form>

<form id="infinity">
  <block>
    and beyond
    <exit/>
  </block>
</form>
</vxml>

When navigating to another document, you can specify the specific dialog where you want execution to begin by specifying its id as the fragement identifier in the URL. If you don't specify an explicit dialog, execution will begin with the first form or menu in the document.

<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
<form>
  <block>
    <goto next="document2.vxml#infinity"/>
  </block>
</form>
</vxml>

You've learned how to use blocks to play audio and redirect control. Now it's time to learn how to get input from the user. The primary method of gathering user input is via the field element, which is another type of form item. Although block and field elements are both form items, field is a special kind of form item. A field is a blank slate waiting to be filled by user input, whereas a block is capable only of executing instructions such as audio and goto elements.

For each field, you must provide a grammar, which is a piece of information describing the allowable user input for a given field. The VoiceXML interpreter uses this information to determine whether the user's response was meaningful for this particular field.

A field element has several important components: a name, one or more prompts, one or more grammars, and instructions to be executed when the field has been filled. The name attribute of the field identifies the variable associated with the user input you collect. You use one or more prompt elements to instruct the user on how to fill the form item variable. You use a filled element to provide instructions to be executed when the field element's form item variable has been filled.

You can specify grammars in one of two ways:

  • You can set the type attribute of the field element. This causes the VoiceXML interpreter to use one of its "built-in" grammars.
  • You can use one or more grammar elements. A grammar can contain arbitrary words and phrases expressed in a special grammar description language.

Here's a simple example:

<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
  <form>
    <field name="famous" type="boolean">
      <prompt>
      would you like to be famous?
      </prompt>

      <filled>
      got it!
      <if cond="famous">
        let's schedule an audition
        <goto next="schedule.vxml" />
      <else />
        infamous is a reasonable 
        alternative.
        <goto next="infamous.vxml" />
      </if>
      </filled>
    </field>
  </form>
</vxml>

This example includes a VoiceXML document containing a single form with a single field. The field first queues the prompt, "would you like to be famous?" Although we used TTS here, the prompt element also allows any number of audio elements, so you can reference a recorded audio instead.

This example takes the easy way out by referencing the boolean grammar built into the platform. The boolean grammar allows phrases like "yes", "okay", "no", and "nope". Production quality applications should use one or more grammar elements which allows the grammar to be customized and tuned for superior performance and quality. See the Grammar Tutorial to learn about Tellme's grammar support.

When the user says one of the phrases permitted by the boolean grammar, the form item variable associated with the field, "famous" is set to one of ECMAScript true or false. and the filled element is executed. The filled is a lot like a block element: it can contain a set of commands that are executed sequentially. In fact, if you check the VoiceXML 2.x Element Reference and compare the list of child elements allowed by both the block and filled elements, you will discover that they are the same. We call the block and filled elements "executable context".

Much like the block elements in the examples above, the filled element in the previous example contains text that the interpreter adds to the prompt queue. It also contains another control element: if which executes a set of commands if its cond attribute evaluates to true. In the example, the condition is the value of the form item variable, "famous". If the value of famous is ECMAScript true, the interpreter queues the TTS "let's schedule an audition" and navigates to "schedule.vxml". If the value of famous is ECMAScript false, the interpeter queues the TTS "infamous is a reasonable alternative" and navigates to "infamous.vxml"

No matter how precise your prompts, users won't always provide the input allowed by your grammar. Sometimes they won't provide any input at all. When these situations occur, the interpreter throws either a nomatch or a noinput event to your application. To prevent your application from bailing out when one of these events occur, you need to author one or more event handlers.

The following example includes a nomatch and a noinput handler. Copy and paste this example into your Tellme Studio Scratchpad to see how the interpreter behaves when you provide unexpected input such as "maybe" or "ask me later". Remain silent for several seconds after the prompt is played to see how a noinput is handled.

<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
  <form>
    <field name="famous" type="boolean">
      <prompt>
      would you like to be famous?
      </prompt>

      <noinput>
      Sorry I didn't hear you.
      For famous press 1.
      For infamous press 2.      
      </noinput>

      <nomatch>
      Sorry I didn't get that.
      To be famous say yes.
      Otherwise, say no.    
      </nomatch>

      <filled>
      got it!
      <if cond="famous">
        let's schedule an audition
      <else />
        infamous is a reasonable 
        alternative.
      </if>
      <clear/>
      </filled>
    </field>
  </form>
</vxml>

To learn more about event handling, see the Event Handling Tutorial.

This example also introduced the clear element which, as its name implies, clears the value of any form item variables - in this case the variable "famous". It also reinitializes the counters maintained for each event that was thrown. You can learn more about event counters in the Event Handling Tutorial.

Tellme Networks, Inc.Terms of ServicePrivacy PolicyGeneral Disclaimers