VoiceXML applications running on the Tellme Voice Application Network leverage Tellme Network's computational resources to provide the best caller experience possible. Ideally, the demand an application places on the network will be transparent to the caller, and the system will seem instantaneously responsive regardless of the amount of data being processed. In reality, Automatic Speech Recognition (ASR) is computationally intensive, and its demands increase with the complexity of the recognition. In particular, the caller may perceive latency before the system responds when a large grammar requires extensive computation for certain input. By default, the Tellme Voice Application Network covers this latency with a percolating sound, also known as the Tellme hourglass. The hourglass provides the user with a hint that the system is processing the input.
For some applications, the default hourglass may sound inappropriate. You can apply techniques to minimize the duration of the hourglass, to customize the hourglass audio, or to eliminate the hourglass entirely. This document explains how.
- 1. Understanding when the hourglass is played
- 2. Using fetchaudio to minimize the hourglass
- 3. Choosing between hourglass and fetchaudio
- 4. Customizing the hourglass
The Tellme VoiceXML interpreter plays an hourglass to provide the user with audible feedback during recognition of voice input. By default, the hourglass continues while the interpreter fetches additional resources such as scripts, data, or the next VoiceXML document. The hourglass terminates when the interpreter reaches the next listen state at which time the prompt queue is flushed.
The following timeline illustrates the default duration of the hourglass, labeled "h".
When the interpreter executes the following example, it begins playing the hourglass during recognition. If recognition is successful, the interpreter continues to play the hourglass while the CGI is executed and until the interpreter flushes the audio queue or enters a listen state.
<vxml version="2.1"> <form> <field name="emp"> <prompt>Say the name of an employee</prompt> <grammar mode="voice" type="application/srgs+xml" src="long-employee-list.grxml"/> <noinput> Sorry, I didn't hear you. <reprompt/> </noinput> <nomatch> Sorry, I didn't get that. <reprompt/> </nomatch> <filled> <submit next="get-emp-info.cgi" namelist="emp" /> </filled> </field> </form> </vxml>
Note. The hourglass is not played when the interpreter processes DTMF input or if it doesn't detect speech input (noinput) from the user.
As described in the previous section, the default duration of the hourglass consists of two intervals:
- From end of speech detection to end of recognition.
- From end of recognition until the next listen state is reached.
In most cases, caller-perceived latencies in VoiceXML applications occur during the second interval during which the interpreter typically fetches documents over the Internet. For this reason, the VoiceXML specification includes a fetchaudio attribute on elements that perform fetches. The feature allows the application to specify audio that is played from when the fetch is attempted until another fetch with fetchaudio or the next listen state is reached. The hourglass is terminated as soon as the fetchaudio begins playing. Fetchaudio is described in detail in the fetchaudio tutorial.
By employing fetchaudio, you can reduce the duration of the hourglass so that it only plays during the first interval. The fetchaudio covers the second interval. In the following timeline, the hourglass is played during the interval labeled "h", and fetchaudio is played during the interval labeled "FA".
The following example is identical to the code example in the previous section with one exception: a fetchaudio attribute has been added to the submit element. When the interpreter executes the submit element, immediately after recognition, it fetches and plays the audio associated with the fetchaudio attribute.
<vxml version="2.1"> <form> <field name="emp"> <prompt>Say the name of an employee</prompt> <grammar mode="voice" type="application/srgs+xml" src="long-employee-list.grxml"/> <noinput> Sorry, I didn't hear you. <reprompt/> </noinput> <nomatch> Sorry, I didn't get that. <reprompt/> </nomatch> <filled> <submit fetchaudio="silence.wav" next="get-emp-info.cgi" namelist="emp" /> </filled> </field> </form> </vxml>
If you do not want to play audio but to instead minimize the duration of the hourglass, set the fetchaudio attribute to a short (10ms) audio clip of recorded silence. This minimal audio clip will be enough to stop the playing of the hourglass. Observe, however, that if the time required to fetch and process the resource is substantial, the user will experience dead air.
Using fetchaudio to minimize the hourglass may not be appropriate under all circumstances. Specifically, if the grammar is large, and the resulting recognition takes a long time, the hourglass will play for a long enough interval such that it is noticed by the user. When the recognition ends and the fetch begins, the user will also notice the abrupt interruption of the hourglass by the fetchaudio.
In general, you should follow these principles:
- Minimize the hourglass when the typical recognition time is short relative to the latency of the proceeding fetch.
- Customize the hourglass, and avoid fetchaudio if the typical recognition will take a long time.
You can customize the hourglass using the property element. The Tellme VoiceXML interpreter supports the following property names:
- Indicates whether or not the hourglass is played during a recognition. The default is 'true'. Set the value to 'false' to disable the hourglass.
- Specifies the URL of the audio file to play when the hourglass is enabled. For a list of supported audio file formats, see the audio element.
All elements contained within the parent element of a property element inherit the setting established by the property; thus, you can set the hourglass properties at application, document, and dialog scopes rather than set the properties on every field. If, however, you set these properties at an outer scope, you can override their behavior by setting them explicitly at an inner scope.
In the following example, the tellme.hourglass.url property is set in the application root document, 'app_root.vxml'. The interpreter plays the .wav associated with this property whenever the hourglass is needed during execution of this application. Although the tellme.hourglass.active property is not set explicitly in the application root, it is active by default. In the document 'doc1.vxml', the hourglass is disabled for all recognitions with the exception of the field named 'field2'.
<!-- app_root.vxml --> <vxml version="2.1"> <property name="tellme.hourglass.url" value="http://audio.acme-airlines.net/vxml/hourglass.wav"/> </vxml>
<!-- doc1.vxml --> <vxml version="2.1" application="app_root.vxml"> <!-- disable hourglass for all recognition in this document --> <property name="tellme.hourglass.active" value="false"/> <form> <field name="field1"> <!-- hourglass is disabled --> </field> <field name="field2"> <!-- override doc setting, and enable hourglass for this field --> <property name="tellme.hourglass.active" value="true"/> </field> </form> </vxml>