Tellme Studio

Home MyStudio About Tellme
Using fetchaudio
Last Updated: 09-02-2005

VoiceXML applications are composed of one or more VoiceXML documents, audio files, grammars, scripts, and data. A VoiceXML interpreter uses HTTP to retrieve these application resources. Sometimes, the transition between documents, the invocation of a subdialog, or the fetching of data can introduce some latency in your application that requires the user to wait.

Latencies in a voice application are detrimental to the user experience. During the development and testing of a voice application, you should identify and eliminate latencies throughout your application. The gaps that remain should be minimized as much as possible and filled with audio that gives the user the sense that the application is still functioning. This article first discusses probable causes of latencies in your voice application and how to eliminate them. The article then explains how to use the fetchaudio attribute to mask unavoidable latencies in your voice application.

You can discover latencies in your voice application by running it and listening for dead air as the application transitions between states. You can also use the Tellme Studio debug log viewer to analyze the precise amount of time it takes the VoiceXML interpreter to fetch each of the resources that comprise your application.

The following output from the debug log shows HTTP requests for two VoiceXML documents. The first request took 80 milliseconds; the second took 200 milliseconds.

[04/02/02 17:31:21] LOG IN
[04/02/02 17:31:22] FETCH http://www.acme.net/trading/index.vxml LOADED 80ms
[04/02/02 17:31:22] FETCH http://www.acme.net/trading/app_root.vxml LOADED 200ms

Once you have discovered a latency in your voice application, you need to determine its cause. Only then can you take steps to eliminate it. The following is a list of potential reasons for voice application latency and some suggestions on how to elminate those latencies:

  • Some latencies result from excessive computation. You should verify how often you need to perform those computations. If, for example, a result doesn't change after you've computed it, consider caching the result at application scope.
  • File fetches can be expensive. While it is tempting to modularize code, don't overdo it. If, for example, you have a command grammar that you only use in one place in your application, you are better off inlining it rather than sourcing it in externally. If your application references external resources such as JavaScript or command grammars that are used by multiple documents in your application, consider combining those documents. Alternatively, you can develop a build process that pre-processes your application documents and inlines static resources that are externally referenced prior to publishing your application to the Web.
  • Your VoiceXML interpreter always fetches the documents that make up your application from the Web Server. Effective use of HTTP caching protocols can yield significant performance gains. For more information about caching, see Effective Use of Caching to Boost VoiceXML Application Performance
  • Your content is hosted on a slow Web server with an unreliable connection to the Internet. By ensuring they your application is hosted on one or more high performance Web servers with high speed connections to the Internet, you can better ensure that your content is available when it's needed by the VoiceXML interpreter.

Investigating each of these possibilities will solve most application latency issues. Occasionally, however, you'll need to mask an unavoidable latency. For instance, suppose your application needs to fetch a proprietary stock feed that is accessed using a server-side script that applies business logic before transmitting the data to your application. Furthermore, suppose this process is dynamic, meaning that the results can't be cached by the interpreter. When this situation arises, you'll need to use the fetchaudio attribute to hide the latency from your users.

The fetchaudio attribute is allowed on the following elements:

You set the value of the fetchaudio attribute to a URI that references an audio file. The audio formats supported by the fetchaudio attribute are the same as those supported by the audio element. The audio is played by the VoiceXML interpreter while the interpreter loads and processes the requested resource. The audio will be interrupted by the interpreter when the transition has completed and the next chunk of audio is ready to be played. Thus, fetchaudio can be used effectively when a high latency fetch would otherwise cause a noticeable gap in the voice user interface.

In addition to being interruptible by the interpreter, fetchaudio has other characteristics that differ from the audio element. For one, fetchaudio flushes the prompt queue immediately. This means, for example, that any audio queued in a block that precedes the element that specifies the transition or data request will be played prior to the fetchaudio. In addition, because grammar information is unavailable during the fetchaudio both the prompts before the fetchaudio and the fetchaudio itself do not allow the user to barge in or to be recognized. An example illustrates this point more clearly:

<?xml version="1.0"?>
<vxml version="2.1"
  application="root.vxml"
  xmlns="http://www.w3.org/2001/vxml">
  <form id="one">
    <field name="confirmation" type="boolean">
      <prompt>Are you sure you want the stock information?</prompt>
      <filled>
        <if cond="confirmation">
          <prompt>Your current stock is trading at</prompt>
          <goto next="#two"/>
        <else/>
          <goto next="mainmenu.vxml"/>
        </if>
      </filled>
    </field>
  </form>

  <form id="two">
    <script src="stockParsingFunctions.js"/>
    <data name="stockData" 
     expr="'myVerySlowQuoteInformation.pl?stockname=' + 
           application.currentStockName"/>
    <field name="again" type="boolean">
      <prompt>
        <value expr="stockDollars(stockData)"/> dollars and 
        <value expr="stockCents(stockData)"/> cents.  
        Do you want another quote?
      </prompt>
      <filled>
        <if cond="again">
          <goto next="getNextStock.vxml"/>
        <else/>
          <goto next="mainmenu.vxml"/>
        </if>
      </filled>
    </field>
  </form>
</vxml>

If "Your current stock is trading at" takes 3.5 seconds to play and the slow data fetch takes about 6 seconds, the following timeline represents what happens after the user presses 1 in the first form:

Now, let's modify the data element to include a fetchaudio attribute as follows:

<data name="stockData" fetchaudio="http://www.acme-trading.net/audio/music.wav" 
   expr="'myVerySlowQuoteInformation.pl?stockname=' + application.currentStockName"/>

The resulting timeline follows:

Clearly the application developer did not intend to play the fetchaudio music between "Your current stock is trading at" and "5 dollars and 6 cents." To design the application properly to use fetchaudio, the application developer should insert a prompt such as "Please hang on; this may take a while" before the fetchaudio music, and move all prompts related to the amount to the second form following the data fetch. The revised code follows:

<?xml version="1.0"?>
<vxml version="2.1"
 application="root.vxml"
 xmlns="http://www.w3.org/2001/vxml">
  <form id="one">
    <field name="confirmation" type="boolean">
      <prompt>Are you sure you want the stock information?</prompt>
      <filled>
        <if cond="confirmation">
          <prompt>Please hang on; this may take a while</prompt>
          <goto next="#two"/>
        <else/>
          <goto next="mainmenu.vxml"/>
        </if>
      </filled>
    </field>
  </form>

  <form id="two">
    <script src="stockParsingFunctions.js"/>
    <data name="stockData" fetchaudio="http://www.acme-trading.net/audio/music.wav"
     expr="'myVerySlowQuoteInformation.pl?stockname=' + 
           application.currentStockName"/>
    <field name="again" type="boolean">
      <prompt>Your current stock is trading at</prompt>
      <prompt>
        <value expr="stockDollars(stockData)"/> dollars and 
        <value expr="stockCents(stockData)"/> cents.  
        Do you want another quote?
      </prompt>
      <filled>
        <if cond="again">
          <goto next="getNextStock.vxml"/>
        <else/>
          <goto next="mainmenu.vxml"/>
        </if>
      </filled>
    </field>
  </form>
</vxml>

The corresponding timeline follows:

If your application performs multiple consecutive fetches, and the potential latency will be noticeable, you should add the fetchaudio attribute to the first element that performs a fetch only. The audio associated with the fetchaudio will play until all fetches have completed and the interpreter is ready for input. Make sure the duration of the fetchaudio clip is sufficient to cover all fetches.

<form>
  <data name="data1" src="data1.cgi" fetchaudio="processing.wav"/>
  <data name="data2" src="data2.cgi" />
  <data name="data3" src="data3.cgi" />
</form>

The following are the important points to remember when you use fetchaudio:

  • All prompts queued prior to the fetchaudio will be played prior to the fetchaudio.
  • All prompts played before the fetchaudio are non-bargeable.
  • Audio associated with fetchaudio is also non-bargeable.
  • All prompts before the fetchaudio are guaranteed to be played to completion, regardless of when the fetch finishes. See Known issues for important information regarding this point.
  • Audio associated with fetchaudio will be interrupted as soon as the interpreter is ready to flush the next set of prompts from the queue.
  • When a sequence of elements specify fetchaudio, the interpreter interrupts the previous fetchaudio when it begins to process the next element that has a fetchaudio attribute.
  • The interpreter does not loop the audio associated with fetchaudio. If the interpreter is still processing the associated request when the fetchaudio playback has completed, dead air will result. To avoid this situation, Tellme recommends that your fetchaudio be much longer than your fetch. Consider a five minute audio clip. Because the Tellme Voice Application Network streams audio, there is no performance penalty for large audio files.

The following is a list of known issues related to fetchaudio:

  • If the interpreter executes fetchaudio, completes the retrieval of the associated content, queues additional audio, and then encounters another fetchaudio, the queued audio is not always played to completion. This bug will be fixed in a future release of the Tellme Voice Application Network.
  • VoiceXML specifies three properties that complement the fetchaudio attribute: fetchaudio, fetchaudiodelay, and fetchaudiominimum. These properties are not currently supported by the Tellme VoiceXML interpreter. For more information on these properties, please see Fetching Properties in the VoiceXML specification.
See Also
Minimizing the Duration of the Hourglass
Tellme Networks, Inc.Terms of ServicePrivacy PolicyGeneral Disclaimers