Creating and Manipulating Utterance Recordings
Last Updated: 11-07-2005

While you can use the record element to record unconstrained user input and then submit those recordings to an HTTP server for further processing, the Tellme VoiceXML interpreter also supports recording during recognition as described in the VoiceXML 2.1 specification.

This document explains how to enable utterance recordings in your application and how to submit an utterance recording to an HTTP server.

To enable utterance recordings, set the recordutterance property to true.

<property name="recordutterance" value="true"/>

The "recordutterance" property replaces the Tellme proprietary "tellme.field.recordutterance" property.

Note. Although you can enable this feature at any scope, you will achieve better application performance by only doing so when you plan to use the recording.

The Tellme VoiceXML interpreter supports utterance recordings during execution of field, initial, and menu. Utterance recordings are also available when a link is matched.

Each time the Tellme VoiceXML interpreter attempts a recognition while executing a field, initial, or menu, one of the following will occur:

  1. The recognizer detects and recognizes the user's input.
  2. The recognition detects user input, but recognition fails, and the interpreter throws a nomatch event.
  3. The recognizer doesn't detect any user input, and the interpreter throws a noinput event.

If recognition occurs and utterance recording is enabled, the interpreter stores a reference to the recording in the recording shadow variable of the active form item variable associated with the field or initial.

If recognition or a nomatch occurs, the interpreter stores a reference to the recording in application.lastresult$.recording.

If a noinput occurs, an utterance recording is not created, and the recording shadow and lastresult$ variables are undefined.

If an utterance recording is created, the interpreter also reflects the size (in bytes) and duration (in milliseconds) of the recording to the variables recordingsize and recordingduration, respectively.

The following example enables recognition recording and prompts the user for a city and state. On the third nomatch, the interpreter plays back a recording of the unrecognized user utterance stored in the recording shadow variable.

<?xml version="1.0"?>
<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
  <form>
    <property name="recordutterance" value="true"/>

    <field name="city_state">
     <prompt>
      Say a city and state.
     </prompt>

     <grammar type="application/srgs+xml" 
       mode="voice" src="citystate.gsl"/>

     <nomatch>
      I'm sorry. I didn't get that.
      <reprompt/>
     </nomatch>

     <nomatch count="3">
      I heard you say 
      <audio expr="lastresult$.recording"/>
      Please try again.
     </nomatch>

     <filled>
       You said <value expr="city_state"/>
       <exit/>
     </filled>
    </field>  
  </form>
</vxml>


The example in the previous section played back the unrecognized utterance to the user. Instead of playing back the recording to the user, you are more likely to submit the recording to an HTTP server for further application tuning or for use by a call center agent. The following example accomplishes this using the data element. Observe that the size and duration of the recording are also submitted.

<?xml version="1.0"?>
<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">
  <!-- in a real app, the user might supply
   an account number as input -->
  <var name="account" expr="1234"/>

  <form id="get_city">
    <!-- enable recognition recording for this dialog -->
    <property name="recordutterance" value="true"/>

    <field name="city_state">
     <prompt>
      Say a city and state.
     </prompt>

     <grammar type="application/srgs+xml" 
       mode="voice" src="citystate.gsl"/>

     <nomatch>
      I'm sorry. I didn't get that.
      <reprompt/>
     </nomatch>

     <nomatch count="3">
      Sorry you're having trouble.
      <!-- assign the recognition recording to a temp variable  
        merely to interface with the parameter name 
        expected by the server-side script
      -->
      <var name="clip" expr="lastresult$.recording"/>
      <var name="size" expr="lastresult$.recordingsize"/>
      <var name="duration" expr="lastresult$.recordingduration"/>
      <!-- submit the account number and 
      utterance recording data to an HTTP server
      -->
      <data name="dom" src="persist.cgi" 
        enctype="multipart/form-data"
        namelist="account clip size duration"/>
      <!-- now transfer to a call center -->
      <goto next="xfer.vxml"/>
     </nomatch>

     <filled>
       You said <value expr="city_state"/>
       <!-- in a real app, save this data,
         and navigate to the next state -->
       <exit/>
     </filled>
    </field>  
  </form>
</vxml>


You can also use the subdialog and submit elements to submit your recording to an HTTP server. Regardless of the element you use, you should set the method attribute of that element to "post" and the enctype attribute to "multipart/form-data". RFC 2388 describes this encoding in detail. Most server-side frameworks provide an API for decoding HTTP requests using this encoding. Several examples are provided in the record element reference. Please see your server-side framework documentation for specific details on accessing HTTP file uploads.

Prior to Revision 3 and the implementation of VoiceXML 2.1 on the Tellme Voice Application Network, you could only submit utterance recordings to an HTTP server by first assigning a reference to the recording to a record element. To prevent the record element from being selected by the Form Interpretation Algorithm (FIA), you would set its cond attribute to false.

In the following example, the form "get_city" includes a record element and a field element. Because the cond attribute of the record element evaluates to false, the record will not be selected by the FIA. On the third nomatch, the recording shadow variable of the field is assigned to the record form item variable. The record form item variable is then submitted to an HTTP server.

<?xml version="1.0"?>
<vxml version="2.0"
 xmlns="http://www.w3.org/2001/vxml">
  <!-- in a real app, the user might supply
   an account number as input -->
  <var name="account" expr="1234"/>

  <form id="get_city">
    <!-- enable recognition recording for this dialog -->
    <property name="recordutterance" value="true"/>

    <record name="clip" cond="false"/>

    <field name="city_state">
     <prompt>
      Say a city and state.
     </prompt>

     <grammar type="application/srgs+xml" 
       mode="voice" src="citystate.gsl"/>

     <nomatch>
      I'm sorry. I didn't get that.
      <reprompt/>
     </nomatch>

     <nomatch count="3">
      Sorry you're having trouble.
      <!-- assign the recognition recording to a record item  -->
      <assign name="clip" expr="city_state$.recording"/>
      <!-- submit the account number and 
      recognition recording to an HTTP server
      -->
      <data name="dom" src="persist.cgi" 
        enctype="multipart/form-data"
        namelist="account clip"/>
      <!-- now transfer to a call center -->
      <goto next="xfer.vxml"/>
     </nomatch>

     <filled>
       You said <value expr="city_state"/>
       <!-- in a real app, save this data,
         and navigate to the next state -->
       <exit/>
     </filled>
    </field>  
  </form>
</vxml>


If you need to transition between forms before submitting the recording, you can assign the recording shadow variable to a variable you declare at document or application scope. Note that, prior to migrating your existing application to Revision 3 or later, before you can submit the recording to an HTTP server or play the recording using the audio element, you must assign the recording to a record element.

<?xml version="1.0"?>
<vxml version="2.0"
 xmlns="http://www.w3.org/2001/vxml">

  <var name="recording"/>
  <!-- in a real app, you would prompt for this -->
  <var name="account" expr="1234"/>

  <form>
    <!-- enable recognition recording for this dialog -->
    <property name="recordutterance" value="true"/>

    <field name="city_state">
     <prompt>
      Say a city and state.
     </prompt>

     <grammar type="application/srgs+xml" 
       mode="voice" src="citystate.gsl"/>

     <nomatch>
      I'm sorry. I didn't get that.
      <reprompt/>
     </nomatch>

     <nomatch count="3">
      Sorry you're having trouble.
      <!-- assign the recognition recording to a temporary variable -->
      <assign name="recording" expr="lastresult$.recording"/>
      <goto next="#agent"/>
     </nomatch>

     <filled>
       You said <value expr="city_state"/>
       <!-- in a real app, save this data,
         and navigate to the next state -->
       <exit/>
     </filled>
    </field>  
  </form>

  <form id="agent">
    <record name="clip" cond="false"/>
    <block>
      <assign name="clip" expr="recording"/>
      <!-- submit the account number and 
      recognition recording to an HTTP server
      -->
      <submit next="persist.cgi" namelist="account clip"/>
    </block>
  </form>
</vxml>


The following is a list of known issues related to utterance recordings:

  • According to the VoiceXML 2.1 specification, utterance recordings are optional during execution of transfer or record. The Tellme VoiceXML interpreter does not support the creation of utterance recordings during execution of these elements.
  • The recordutterance property (and hence the deprecated tellme.field.recordutterance property) and the bargeintype property (and hence the deprecated tellme.magicword property) are incompatible. In short, if you enable utterance recordings, the interpreter automatically disables 'hotword' bargein.
[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers