Building Applications Frequently Asked Questions
This FAQ contains answers to questions about building VoiceXML applications using Tellme
- How do I tell the application what spoken input to expect?
- Does Tellme provide any "pre-built" grammars?
- What is grammar "tuning"?
- Does Tellme support recognition of languages other than English?
- What are the limits on grammar size?
Q: What type of language is VoiceXML?|
A: VoiceXML is a declarative, XML-based language comprised of elements that describe the human-machine
interaction provided by a voice response system. This includes:
implement client-side logic.
- Output of audio files and synthesized speech (text-to-speech).
- Recognition of spoken and DTMF input.
- Control of telephony features such as call transfer and disconnect.
- Direction of the call flow based on user input
Q: How do I tell the application what spoken input to expect?|
A: The voice application uses a grammar to define what utterances are legal for the caller to say at a
particular point in the application. (An utterance is speech input before it has been recognized by the voice
recognizer as a specific response.) A grammar represents the set of accepted inputs via a list of regular
expressions. A grammar representing the possible answers to the question "How do you travel from home to work?"
might specify the possible utterances like this in SRGS/GRXML format:
The VoiceXML language lets the developer specify which grammars are in force at any point in the application via
a grammar scoping capability. Grammars are included into a VoiceXML file either in-line, or through references to
external grammar files.
The Tellme Platform currently supports SRGS/GRXML grammar format, with legacy support for the GSL grammar format.
Q: Does Tellme provide any "pre-built" grammars?|
A: Yes, the Tellme Platform provides access to many pre-built grammars. These grammars are either commonly
used, difficult to create or require constant maintenance. Tellme currently provides many pre-built grammars,
contains the full list of grammars and their descriptions.
- General: Yes/No
- Credit Cards: Expiration date, Expiration month, Expiration Year, Credit Card Number
- Date/Time: Day of month, Day of year, Month, Year, Date, AM/PM, TimeDuration (in minutes or days), Hour,
- Financial: US Dollars (no cents), US Money (dollars and cents),
- Locations: City/State
- Numbers: Digits, Natural numbers, Percentages, Social-Security Numbers
- Telephone: US Phone number, Phone extension, Area Code, 7-digit phone, 10-digit phone
Q: What is grammar "tuning"?|
A: A voice application, like any user-centric application, is prone to certain problems that may only be
discovered through formal usability testing, or observation of the application in use. Poor speech recognition
accuracy is one type of problem common to voice applications, and a problem most often caused by poor grammar
implementation. When users mispronounce words or say things unexpected by the grammar designer, the recognizer
cannot match their input against the grammar. Poorly designed grammars containing many difficult-to-distinguish
entries will also result in many misrecognized inputs.
Grammar tuning is the process of improving recognition accuracy by modifying a grammar based on an analysis of
its performance. Tuning is often performed during an iterative process of usability testing and application
improvement and may involve amending the grammar with commonly spoken phrases, removing highly confusable words,
and adding additional ways that callers may pronounce a word.
Q: How do you handle speakers with foreign accents?|
A: Tellme's work on
, which naturally is designed to be usable by the widest range of speakers, has shown that the Microsoft speech
recognition engine employed by the Tellme Platform does a very good job with callers with strong foreign
Q: Does Tellme support recognition of languages other than English?|
A: Not currently. For now, Tellme is focused on maximizing reliability of English recognition over the
phone. Adding a new language is a non-trivial task. Even if a model exists which lets a recognizer understand a
native speaker under ideal acoustic conditions, making that same model work reliably when confronted with the
noisy audio environment of the phone is very difficult.|
Q: What are the limits on grammar size?|
A: Though the speech recognizer doesn't place a specific limit on grammar size, several practical
considerations serve to effectively limit the maximum grammar size to tens of thousands of entries. First, as the
grammar grows in size, the recognizer must test an utterance against a larger number of possibilities, slowing
the recognizer considerably. Second, the larger the grammar, the greater the possibility that it contains words
that are easily confusable. This lowers the accuracy of the grammar. However, with careful planning, it is
possible to create large grammars that are still effective. Tellme has production grammars with up to thirty