Building Applications Frequently Asked Questions
This FAQ contains answers to questions about building VoiceXML applications using Tellme
- How do I prompt users or deliver information to callers?
- What formats do you support for recorded audio?
- What are the optimal settings for recorded audio?
- What happens if I use non-optimal audio file formats?
- What does Tellme use for producing text-to-speech?
- How good is the text-to-speech engine?
- What "voices" are available for text-to-speech?
- In what ways can I alter the TTS output?
- Does the TTS engine support languages other than English?
Q: What type of language is VoiceXML?|
A: VoiceXML is a declarative, XML-based language comprised of elements that describe the human-machine
interaction provided by a voice response system. This includes:
implement client-side logic.
- Output of audio files and synthesized speech (text-to-speech).
- Recognition of spoken and DTMF input.
- Control of telephony features such as call transfer and disconnect.
- Direction of the call flow based on user input
Q: How do you handle speakers with foreign accents?|
A: Tellme's work on
, which naturally is designed to be usable by the widest range of speakers, has shown that the Microsoft speech
recognition engine employed by the Tellme Platform does a very good job with callers with strong foreign
Q: How do I prompt users or deliver information to callers?|
A: Audio output is used to prompt callers for input and to deliver information to them. The audio may be
specified as a pre-recorded audio file, or as text that is converted to speech on the fly using the Tellme Voice
Application Network's text-to-speech (TTS) engine.
Similar to how image are specified in HTML, the audio file is specified by a URL. The Tellme Platform will
retrieve the audio file before it is played to the user. The program can also specify text to be played to the
user (using TTS) in the event that the audio file cannot be retrieved.
Pre-recorded audio is almost always preferred in production applications, as it provides the most natural
sounding interface. Text-to-speech is useful for rapid application prototyping and for generating output from
data whose content cannot be guessed in advance. For example, an application that reads a user's e-mail over the
Q: What formats do you support for recorded audio?|
A: WAV files using either PCM or m-law encoding with the following parameters:
Based on business need, we may support other formats in the future.
||8- or 16-bit
||8, 11.025, 16, 22.05, 44.1
Q: What are the optimal settings for recorded audio?|
A: For optimal playback efficiency, audio should be recorded in 8KHz, 8-bit ulaw (G.711), mono format.
Although the Tellme platform supports higher fidelity formats, the current phone network only supports audio at
Q: What happens if I use non-optimal audio file formats?|
A: An application using non-optimal audio file format will operate correctly, but will waste network
resources. Non-optimal audio files are converted on the fly to 8KHz, 8-bit, mono m-law as they are played to the
user. If a great many non-optimal audio files require conversion at once, the audio servers can become inundated
with conversion requests, resulting in poor audio playback. It will also take also take longer to retrieve the
larger, high-resolution audio files over the Internet, wasting Internet bandwidth for both Tellme and the
Q: How good is the text-to-speech engine?|
A: The TTS engine can convert most English sentences to understandable speech. It also handles certain
special cases, such as text containing dollar amounts. For example, it will read "$3.75" as "three dollars and
Though text-to-speech technology has come a long way since the talking computer in "War Games", the output still
has a computer-generated quality. Pre-recorded audio, though time consuming to create, is the only way to
generate a natural, human-sounding voice application interface. The widespread use of TTS within an application
is not recommended.
Q: What "voices" are available for text-to-speech?|
A: The Tellme TTS engine supports only one voice today, an adult female. Tellme may make additional voices
available in the future, depending on business need and customer feedback.|
Q: In what ways can I alter the TTS output?|
A: None of the TTS parameters (talking speed, pitch, voice, etc.) may be modified by voice applications
running on today's Tellme Voice Application Network. These parameters may be exposed in the future based on
business need and customer/developer feedback.|
Q: Does the TTS engine support languages other than English?|
A: No. The TTS engine is designed to produce correct output for English words only. If the engine encounters
a word it does not know, it will make the best guess on how it should be pronounced. This will produce poor
pronunciations for most foreign languages.|