Using fetchaudio
Last Updated: 09-02-2005

VoiceXML applications are composed of one or more VoiceXML documents, audio files, grammars, scripts, and data. A VoiceXML interpreter uses HTTP to retrieve these application resources. Sometimes, the transition between documents, the invocation of a subdialog, or the fetching of data can introduce some latency in your application that requires the user to wait.

Latencies in a voice application are detrimental to the user experience. During the development and testing of a voice application, you should identify and eliminate latencies throughout your application. The gaps that remain should be minimized as much as possible and filled with audio that gives the user the sense that the application is still functioning. This article first discusses probable causes of latencies in your voice application and how to eliminate them. The article then explains how to use the fetchaudio attribute to mask unavoidable latencies in your voice application.

1. Analyzing application latencies

You can discover latencies in your voice application by running it and listening for dead air as the application transitions between states. You can also use the Tellme Studio debug log viewer to analyze the precise amount of time it takes the VoiceXML interpreter to fetch each of the resources that comprise your application.

The following output from the debug log shows HTTP requests for two VoiceXML documents. The first request took 80 milliseconds; the second took 200 milliseconds.

[04/02/02 17:31:21] LOG IN
[04/02/02 17:31:22] FETCH http://www.acme.net/trading/index.vxml LOADED 80ms
[04/02/02 17:31:22] FETCH http://www.acme.net/trading/app_root.vxml LOADED 200ms

Once you have discovered a latency in your voice application, you need to determine its cause. Only then can you take steps to eliminate it. The following is a list of potential reasons for voice application latency and some suggestions on how to elminate those latencies:

Investigating each of these possibilities will solve most application latency issues. Occasionally, however, you'll need to mask an unavoidable latency. For instance, suppose your application needs to fetch a proprietary stock feed that is accessed using a server-side script that applies business logic before transmitting the data to your application. Furthermore, suppose this process is dynamic, meaning that the results can't be cached by the interpreter. When this situation arises, you'll need to use the fetchaudio attribute to hide the latency from your users.

2. Using fetchaudio

The fetchaudio attribute is allowed on the following elements:

You set the value of the fetchaudio attribute to a URI that references an audio file. The audio formats supported by the fetchaudio attribute are the same as those supported by the audio element. The audio is played by the VoiceXML interpreter while the interpreter loads and processes the requested resource. The audio will be interrupted by the interpreter when the transition has completed and the next chunk of audio is ready to be played. Thus, fetchaudio can be used effectively when a high latency fetch would otherwise cause a noticeable gap in the voice user interface.

In addition to being interruptible by the interpreter, fetchaudio has other characteristics that differ from the audio element. For one, fetchaudio flushes the prompt queue immediately. This means, for example, that any audio queued in a block that precedes the element that specifies the transition or data request will be played prior to the fetchaudio. In addition, because grammar information is unavailable during the fetchaudio both the prompts before the fetchaudio and the fetchaudio itself do not allow the user to barge in or to be recognized. An example illustrates this point more clearly:

<?xml version="1.0"?>
<vxml version="2.1"
  application="root.vxml"
  xmlns="http://www.w3.org/2001/vxml">
  <form id="one">
    <field name="confirmation" type="boolean">
      <prompt>Are you sure you want the stock information?</prompt>
      <filled>
        <if cond="confirmation">
          <prompt>Your current stock is trading at</prompt>
          <goto next="#two"/>
        <else/>
          <goto next="mainmenu.vxml"/>
        </if>
      </filled>
    </field>
  </form>

  <form id="two">
    <script src="stockParsingFunctions.js"/>
    <data name="stockData" 
     expr="'myVerySlowQuoteInformation.pl?stockname=' + 
           application.currentStockName"/>
    <field name="again" type="boolean">
      <prompt>
        <value expr="stockDollars(stockData)"/> dollars and 
        <value expr="stockCents(stockData)"/> cents.  
        Do you want another quote?
      </prompt>
      <filled>
        <if cond="again">
          <goto next="getNextStock.vxml"/>
        <else/>
          <goto next="mainmenu.vxml"/>
        </if>
      </filled>
    </field>
  </form>
</vxml>

If "Your current stock is trading at" takes 3.5 seconds to play and the slow data fetch takes about 6 seconds, the following timeline represents what happens after the user presses 1 in the first form:

Now, let's modify the data element to include a fetchaudio attribute as follows:

<data name="stockData" fetchaudio="http://www.acme-trading.net/audio/music.wav" 
   expr="'myVerySlowQuoteInformation.pl?stockname=' + application.currentStockName"/>

The resulting timeline follows:

Clearly the application developer did not intend to play the fetchaudio music between "Your current stock is trading at" and "5 dollars and 6 cents." To design the application properly to use fetchaudio, the application developer should insert a prompt such as "Please hang on; this may take a while" before the fetchaudio music, and move all prompts related to the amount to the second form following the data fetch. The revised code follows:

<?xml version="1.0"?>
<vxml version="2.1"
 application="root.vxml"
 xmlns="http://www.w3.org/2001/vxml">
  <form id="one">
    <field name="confirmation" type="boolean">
      <prompt>Are you sure you want the stock information?</prompt>
      <filled>
        <if cond="confirmation">
          <prompt>Please hang on; this may take a while</prompt>
          <goto next="#two"/>
        <else/>
          <goto next="mainmenu.vxml"/>
        </if>
      </filled>
    </field>
  </form>

  <form id="two">
    <script src="stockParsingFunctions.js"/>
    <data name="stockData" fetchaudio="http://www.acme-trading.net/audio/music.wav"
     expr="'myVerySlowQuoteInformation.pl?stockname=' + 
           application.currentStockName"/>
    <field name="again" type="boolean">
      <prompt>Your current stock is trading at</prompt>
      <prompt>
        <value expr="stockDollars(stockData)"/> dollars and 
        <value expr="stockCents(stockData)"/> cents.  
        Do you want another quote?
      </prompt>
      <filled>
        <if cond="again">
          <goto next="getNextStock.vxml"/>
        <else/>
          <goto next="mainmenu.vxml"/>
        </if>
      </filled>
    </field>
  </form>
</vxml>

The corresponding timeline follows:

3. Using fetchaudio with multiple HTTP requests

If your application performs multiple consecutive fetches, and the potential latency will be noticeable, you should add the fetchaudio attribute to the first element that performs a fetch only. The audio associated with the fetchaudio will play until all fetches have completed and the interpreter is ready for input. Make sure the duration of the fetchaudio clip is sufficient to cover all fetches.

<form>
  <data name="data1" src="data1.cgi" fetchaudio="processing.wav"/>
  <data name="data2" src="data2.cgi" />
  <data name="data3" src="data3.cgi" />
</form>

4. Fetchaudio summary

The following are the important points to remember when you use fetchaudio:

5. Known issues

The following is a list of known issues related to fetchaudio:

See Also
Minimizing the Duration of the Hourglass
[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers