Using Mixed Initiative

Voice applications largely consist of numerous call states that collect input from the user. The application typically prompts the user for discrete pieces of information in a pre-determined order such as a credit card number followed by an expiration date. For the user, this can become quite cumbersome, especially when she is accustomed to providing multiple pieces of information in succession without the interruption of intermediary prompts. In addition, the caller may desire to provide the pieces of information in a different order than specified by the application. Mixed initiative addresses both of these issues by allowing the flow of the call to be directed by the user as well as by the application. Effectively, the application collects multiple pieces of information in a single call state.

For example, when booking a flight, a typical human interaction follows:

Agent: Thanks for calling Acme Travel Company. How can I help you today?
Caller: I'd like to book a flight.
Agent: Okay. What is your point of origin, and where are you going?
Caller: I wanna fly From San Francisco, California to Boston, Massachusetts.
Agent: Okay, traveling from San Francisco, California to Boston, Massachusetts. Is that correct?
Caller: Yes.

This interaction consists of collecting two pieces of information using a single prompt - the origin and the destination. An alternate interaction that achieves the same end follows:

Agent: Thanks for calling Acme Travel Company. How can I help you today?
Caller: I'd like to book a flight.
Agent: Okay. What is your point of origin, and where are you going?
Caller: I wanna fly to Boston, Massachusetts.
Agent: You want to fly to Boston, Massachusetts. Where are you flying from?
Caller: From San Francisco, California.
Agent: Okay, you'll be traveling from San Francisco, California to Boston, Massachusetts. Is that correct?
Caller: Yes.

To accomplish this task using VoiceXML, you need to do the following:

  • Define subgrammars to collect each piece of information.
  • Define a form-level grammar that uses the subgrammars to collect the information.
  • Define a mixed-initiative dialog that collects input from the user.

This article walks through each of these steps in detail.

In the travel agent example, the origin and destination are similar pieces of information, and they can be described using the same data. That leaves us with the task of defining a single subgrammar. A small subset of that grammar follows:


<rule id="airports">
   <one-of>
     <item>
       <one-of>
         <item>albuquerque new_mexico </item>
         <item>a b q</item>
       </one-of>
       <tag>out = "albuquerque_nm";</tag>
     </item>
     <item>
       <one-of>
         <item>boston massachusetts</item>
         <item>b o s</item>
       </one-of>
       <tag>out = "boston_ma";</tag>
     </item>
     <item>
       <one-of>
         <item>charlotte north_carolina</item>
         <item>c l t</item>
       </one-of>
       <tag>out = "charlotte_nc";</tag>
     </item>
     <item>
       <one-of>
         <item>los angeles california</item>
         <item>l a x</item>
       </one-of>
       <tag>out = "los_angeles_ca";</tag>
     </item>
     <item>
       <one-of>
         <item>portland oregon</item>
         <item>p d x</item>
       </one-of>
       <tag>out = "portland_or";</tag>
     </item>
     <item>
       <one-of>
         <item>san francisco california</item>
         <item>s f o</item>
       </one-of>
       <tag>out = "san_francisco_ca";</tag>
     </item>
     <item>
       <one-of>
         <item>seattle washington</item>
         <item>s e a</item>
       </one-of>
       <tag>out = "seattle_wa";</tag>
     </item>
  </one-of>
</rule>


The grammar allows the user to say the city and state in which the airport resides or the letters that comprise the airport code. When the recognizer finds a match, it returns the city and state concatenated by an underscore ("_"). In addition, spaces in city names are replaced by underscores.

With the Airports subgrammar defined, we now need to define a grammar that utilizes it to allows the caller to say sentences such as:

I want to go to Albuquerque, New Mexico from San Antonio, Texas.
I wanna fly from S F O to Boston, Massachusetts.
From Cleveland, Ohio.
To Portland, Oregon.

Here's the grammar:


<rule id="top">
  <item repeat="0-1">
    i 
    <one-of>
      <item>want to</item>
      <item>wanna</item>
    </one-of>
    <one-of>
      <item>go</item>
      <item>fly</item>
    </one-of>
  </item>
  <one-of>
    <item>
      from <ruleref uri="#airports"/>
      <tag> out.from = rules.latest(); </tag>
    </item>
    <item>
      to <ruleref uri="#airports"/>
      <tag> out.to = rules.latest(); </tag>
    </item>
    <item>
      from <ruleref uri="#airports"/>
      <tag> out.from rules.latest(); </tag>
      to <ruleref uri="#airports"/>
      <tag> out.to = rules.latest(); </tag>
    </item>
    <item>
      to <ruleref uri="#airports"/>
      <tag> out.to rules.latest();</tag>
      from <ruleref uri="#airports"/>
      <tag> out.from rules.latest();</tag>
    </item>
  </one-of>
</rule>


A slot is a location in which the recognition engine stores the recognition result. Prior to VoiceXML 2.0, the Tellme Platform supported a single slot named "option". To support mixed initiative, the Tellme Platform has been enhanced to support multiple arbitrarily named slots. In the example above, the top-level grammar defines two slots: "from" and "to".

Depending upon what the user says, the recognizer fills the appropriate slot. The following table lists some valid sentences from the above grammar and the value stored in each slot. If a slot is not filled by a sentence, the corresponding table cell is left blank.

sentence from to
I want to go to Albuquerque, New Mexico from San Antonio, Texas. san_antonio_tx albuquerque_nm
I wanna fly from S F O to Boston, Massachusetts. san_francisco_ca boston_ma
From Cleveland, Ohio. cleveland_oh n/a
To Portland, Oregon. n/a portland_or

As you will see in the next section, the slot names can correspond directly to the value of the name attribute of the field in a mixed initiative dialog.

Now that you've defined the grammars, you can define a mixed initiative dialog that uses the top-level grammar to collect user input. A mixed initiative dialog consists of the following parts:

  • grammars defined at form scope
  • an initial element
  • a field for each piece of information to collect.
  • a confirmation field to ensure that the dialog collected the information correctly

The purpose of the initial element is to prompt the user to participate in the mixed initiative dialog. The initial element shouldn't define any grammars or a filled element since it relies upon those elements at the form-level. In addition, it should appear before the other field elements in the dialog so that the VoiceXML interpreter executes it first. The recognizer attempts to match what the user says in response to the initial prompt against the form-level grammar. The initial element can contain all the standard events - noinput, nomatch, help - to aid the user in completing the dialog. If any of the fields in the form are filled by the user, the initial element will not be revisited unless its form item variable, corresponding to its name attribute, is cleared.

In the following example, the user is prompted for starting and destination cities. If the recognition fails twice due to a timeout (noinput) or a misrecognition (nomatch), the code sets the value of the variable "init" to true. With the variable corresponding to the initial element set, the VoiceXML interpreter will no longer execute the contents of the element and move on to the other fields in the dialog in source order.

<initial name="init">
  <prompt>
    <audio>please tell me your starting and destination cities.</audio>
  </prompt>
  <catch event="nomatch noinput">
    <audio>sorry, i didn't catch that.</audio>
    <audio>please say where you'd like to go to and from.</audio>
  </catch>
  <catch event="nomatch noinput" count="2">
    <audio>sorry, i didn't catch that.</audio>
    <assign name="init" expr="true"/>
    <reprompt/>
  </catch>
  <help>
     <audio>to book a flight you need to specify 
     your origin and destination cities.</audio>
     <audio>for example, you can say, 
     "from san francisco, california 
     to boston massachusetts."</audio>
     <reprompt/>
  </help>
</initial>

If the user doesn't provide all the necessary pieces of information when prompted by the initial element, the VoiceXML interpreter executes the contents of individual fields, if they exist, to collect the missing information. If, for example, the caller says, "to Providence, Rhode Island," the "to" slot will have been filled but the "from" slot will remain empty. To help the caller complete the dialog successfully, you should define a field element named "from" or a field with its slot attribute set to "from". This field should prompt the caller for the starting point of their trip. Because the caller could just as easily have said, "from Sioux Falls, South Dakota," you should also define a field element named "to" or a field with its slot attribute set to "to". This field should prompt the caller for the destination point of their trip.

The following code snippet defines two fields. The first collects the user's starting location. Because the field is named "origin", it specifies a slot attribute with value "from" to associate it with the "from" slot in the form-level grammar. The second field is named identically to the slot to which it corresponds; thus, it does not specify the slot attribute.

<field name="origin" slot="from">
  <grammar type="application/srgs+xml" mode="voice" src="airports-voice.grxml"/>
  <prompt>
    <audio>where are you flying from?</audio>
  </prompt>
  <filled>
    <prompt>okay, you'll be flying from
       <value expr="csobj.GetCSTTS(origin)"/> 
    </prompt> 
  </filled>
</field>

<field name="to">
  <grammar type="application/srgs+xml" mode="voice" src="airports-voice.grxml"/>
  <prompt>
    <audio>where do you want to go? </audio>
  </prompt>
  <filled>
    <prompt>your destination is 
       <value expr="csobj.GetCSTTS(to)"/> 
    </prompt>  
  </filled>
</field>

To ensure that the dialog has collected the information from the user correctly, consider defining a confirmation field. This field should be the last one defined in the dialog so that the VoiceXML interpreter visits it only after the data collection fields are filled.

The following code plays back the origin and destination cities, and asks the user if they are correct. If correct, the code navigates to a dialog that collects additional information with the ultimate goal of booking the flight. If incorrect, the code clears the form item variables of each field. Doing so restarts form interpretation beginning again with the contents of the initial element.

<field name="confirm" type="boolean">
   <prompt>
      Okay!  To summarize, you'd like to fly from 
      <value expr="csobj.GetCSTTS(origin)"/> 
      to
      <value expr="csobj.GetCSTTS(to)"/>
      is that correct?
   </prompt>
   <catch event="nomatch noinput">
      Sorry I didn't get that.
      <reprompt/>
   </catch>
   <filled>
      <if cond="confirm">
         <goto next="#get_num_passengers"/>
      <else/>
         <clear/>
      </if>
   </filled>  
</field>

The complete listing of the code for this example follows:

<?xml version="1.0"?>
<vxml version="2.1"
 xmlns="http://www.w3.org/2001/vxml">

<!-- helper script that maps city/state names to audio/tts -->
<script src="citystate.js"/>
<script>
  var csobj = new CityStateReader();
</script>

<!-- application entrypoint -->
<form id="start">
<block>
    <audio src="wav/01.wav">Welcome to ACME travel.</audio>
	<break time="500ms"/>
    <goto next="#get_origin_dest"/>
</block>
</form> 

<!-- retrieves the origin and destination using mixed initiative -->
<form id="get_origin_dest">
   <property name="confidencelevel" value="0.4"/>
   <grammar src="travel-voice.grxml" mode="voice" type="application/srgs+xml"/>

   <catch event="nomatch noinput">
      <audio src="wav/04.wav">sorry, i didn't catch that.</audio>
      <reprompt/>
   </catch>

   <!-- designates the initial state in a mixed initiative dialog -->
   <initial name="init">
      <prompt>
        <audio src="wav/03.wav">
           please tell me your starting and destination cities.
        </audio>
      </prompt>
      <catch event="nomatch noinput">
        <audio src="wav/04.wav">sorry, i didn't catch that.</audio>
        <audio src="wav/11.wav">
           please say where you'd like to go to and from.
        </audio>
      </catch>
      <catch event="nomatch noinput" count="2">
        <audio src="wav/04.wav">sorry, i didn't catch that.</audio>
        <assign name="init" expr="true"/>
        <reprompt/>
      </catch>
      <help>
        to book a flight you need to specify your 
        origin and destination cities.
        for example, you can say, from san francisco, 
        california to boston massachusetts.
      </help>
    </initial>

    <!-- retrieve origin in case it didn't happen in initial state -->
    <field name="origin" slot="from">
      <grammar src="airports-voice.grxml" mode="voice" type="application/srgs+xml"/>
      <prompt>
        <audio src="wav/06.wav">where are you flying from?</audio>
      </prompt>
      <filled>
        <prompt> 
           <value expr="csobj.GetCSTTS(origin)"/>
        </prompt> 
      </filled>
    </field>

    <!-- retrieve destination in case it didn't happen in initial state -->
    <field name="to">
      <grammar src="airports-voice.grxml" mode="voice" type="application/srgs+xml"/>
      <prompt>
        <audio src="wav/08.wav"> where do you want to go? </audio>
      </prompt>
      <filled>
        <prompt> 
           <value expr="csobj.GetCSTTS(to)"/> 
        </prompt>  
      </filled>
    </field>

    <!-- confirm origin and destination -->
	<field name="confirm" type="boolean">
    <prompt>
      <audio src="wav/10.wav">
         Okay! To summarize, you'd like to fly from 
      </audio>
      <prompt> 
         <value expr="csobj.GetCSTTS(origin)"/> 
      </prompt>
      <audio src="wav/05.wav"> to </audio>
      <prompt> 
         <value expr="csobj.GetCSTTS(to)"/> 
      </prompt>
 	  <audio>is that correct?</audio>
    </prompt>
 	<catch event="nomatch noinput">
 	  Sorry I didn't get that.
 	  <reprompt/>
 	</catch>
    <filled>
      <if cond="confirm">
		<goto next="#bookit"/>
      <else/>
		<clear/>
      </if>
	</filled>  
    </field>
   </form>

   <!-- move along now that origin and dest have been collected... -->
   <form id="bookit">
   <block>
      booking your flight
      <goto next="#start"/>
   </block>
   </form>
</vxml>

A listing of the CityStateReader class follows:

function CityStateReader()
{
  this._statecodes = { "al" : "alabama",
   "ak" : "alaska",
   "az" : "arizona",
   "ar" : "arkansas",
   "ca" : "california",
   "co" : "colorado",
   "ct" : "connecticut",
   "dc" : "d_c",
   "de" : "delaware",
   "fl" : "florida",
   "ga" : "georgia",
   "hi" : "hawaii",
   "id" : "idaho",
   "il" : "illinois",
   "in" : "indiana",
   "ia" : "iowa",
   "ks" : "kansas",
   "ky" : "kentucky",
   "la" : "louisiana",
   "me" : "maine",
   "md" : "maryland",
   "ma" : "massachusetts",
   "mi" : "michigan",
   "mn" : "minnesota",
   "ms" : "mississippi",
   "mo" : "missouri",
   "mt" : "montana",
   "ne" : "nebraska",
   "nv" : "nevada",
   "nh" : "new_hampshire",
   "nj" : "new_jersey",
   "nm" : "new_mexico",
   "ny" : "new_york",
   "nc" : "north_carolina",
   "nd" : "north_dakota",
   "oh" : "ohio",
   "ok" : "oklahoma",
   "or" : "oregon",
   "pa" : "pennsylvania",
   "ri" : "rhode_island",
   "sc" : "south_carolina",
   "sd" : "south_dakota",
   "tn" : "tennessee",
   "tx" : "texas",
   "ut" : "utah",
   "vt" : "vermont",
   "va" : "virginia",
   "wa" : "washington",
   "wv" : "west_virginia",
   "wi" : "wisconsin",
   "wy" : "wyoming"};
}

CityStateReader.prototype.Code2StateName = function(code)
{
  var state = this._statecodes[code.toLowerCase()];
  return (state ? state : "");
}

CityStateReader.prototype.GetCSTTS = function(cs)
{
  var parts = cs.split(/_/);
  var state = this.Code2StateName(parts.pop());
  return parts.join(" ") + " " + state;
}

[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers