TTS Engine Behavior

Text-to-Speech (TTS) allows you to develop applications without relying on pre-recorded audio recordings for all voice prompts. You can also use TTS to quickly make changes or announcements to your voice applications without having to record new audio. If you also use the [24]7 Experience Manager, you can make changes to your voice application without deploying new VoiceXML code.

For example, your call center becomes suddenly unavailable to receive calls. Temporarily, yet quickly, you can provide a new TTS prompt to announce to users the issue and provide them with an alternate number to call. When your call center is available again, you can easily remove the TTS announcement and users hear the original recording. Without TTS, you must record new audio and deploy the new audio file, which is a much lengthier method.

In general, Tellme recommends that you supply TTS for all your audio. TTS also functions as a backup for those inopportune situations when a voice audio file malfunctions.

The current version of the TTS engine supports the following combinations of languages and locales, and the voice names associated with each:

Language Gender Voice Name
US English Female zira
US English Male tom
Latin American Spanish Female teresa, hilda
UK English Female hazel
UK English Male george
Australia English Female hayley
Australia English Male james
Canada English Female heather
Spain Spanish Female helena
France French Female hortense
Canada French Female harmonie, isabelle
Canada French Male claude
Italian Female lucia
German Female hedda
Chinese Male kangkang
Chinese Female yaoyao
Japanese Male ichiro
Japanese Female ayumi
Brazilian Portuguese Male daniel

For example:

To specify the voice name, enter the following within the audio element, as follows:

        <voice name="tom"> 
                Welcome to Tellme. 

After the end of the SSML voice element (</voice>), the TTS engine goes back to the default voice.

The current version of the TTS engine supports the following combinations of languages, voice names and gender.

xml:lang gender name Expected Voice
en-US female zira zira
en-US male tom tom
es-MX female teresa teresa
es-MX female hilda hilda
fr-CA female isabelle isabelle
fr-CA female harmonie harmonie
fr-CA male claude claude
fr-FR female hortense hortense
en-AU female hayley hayley
en-AU male james james
es-ES female helena helena
en-GB female hazel hazel
en-GB male george george
en-CA female heather heather
de-DE female hedda hedda
it-IT female lucia lucia
pt-BR male daniel daniel
zh-CN male kangkang kangkang
zh-CN female yaoyao yaoyao
ja-JP male ichiro ichiro
ja-JP female ayumi ayumi
en-US female - zira
en-US male - tom
es-MX female - teresa
fr-CA female - isabelle
fr-CA male - claude
en-AU female - hayley
en-AU male - james
es-ES female - helena
en-GB female - hazel
en-GB male - george
en-CA female - heather
de-DE female - hedda
it-IT female - lucia
pt-BR female - zira
zh-CN female - zira
ja-JP female - zira
pt-BR - daniel daniel
zh-CN - kangkang kangkang
zh-CN - yaoyao yaoyao
ja-JP - ichiro ichiro
ja-JP - ayumi ayumi
- male daniel daniel
- male kangkang kangkang
- female yaoyao yaoyao
- male yaoyao yaoyao
- female kangkang kangkang
- male ichiro ichiro
- female ayumi ayumi
- male ayumi ayumi
- female ichiro ichiro
- - zira zira
- - tom tom
- - teresa teresa
- - hilda hilda
- - isabelle isabelle
- - harmonie harmonie
- - claude claude
- - hayley hayley
- - james james
- - helena helena
- - hazel hazel
- - george george
- - heather heather
- - hedda hedda
- - lucia lucia
- - daniel daniel
- - kangkang kangkang
- - yaoyao yaoyao
- - ichiro ichiro
- - ayumi ayumi
en-US - - zira
es-MX - - teresa
fr-CA - - isabelle
fr-FR - - hortense
en-AU - - hayley
es-ES - - helena
en-GB - - hazel
en-CA - - heather
de-DE - - hedda
it-IT - - lucia
pt-BR - - zira
zh-CN - - zira
ja-JP - - zira
- female - zira
- male - tom
- - - zira

For information about the Speech Synthesis Markup Language (SSML) elements that the TTS engine supports, see the Speech Synthesis Markup Element Reference.

The TTS engine pronounces some acronyms as words, and some are spelled out. Tellme recommends you spell out acronyms to ensure their correct pronunciation, and test your text to speech before delivering your final application. For example, to pronounce SAP as an acronym, space out the letters:


This section covers how email addresses, web addresses, file paths, phone numbers, and mailing addresses should be formatted and how they are read by the TTS engine.

The format of an email address is alias@hostname. If a portion of the alias is recognized as a word, it will be pronounced that way; otherwise, digits are pronounced individually.

Valid email alias separators are ".", "-" and "_".

Text Pronunciation N O dash R E P L Y at example dot com P A U L seven zero at example dot com
mailto:// mailto colon slash slash webmaster at example dot com

The format of a web addresses is either or If the URL is word-based, the words are pronounced. Numbers in addresses are read individually.

Text Pronunciation W W W dot C N N dot com H T T P W W W dot tell me dot com H T T P Canada dot G C dot C A slash home dot H T M L

File paths are generally formatted in one of the following formats:

  • DriveLetter:\DirectoryName\DirectoryName
  • \\DirectoryName\DirectoryName
  • \DirectoryName

Spaces are valid only if the entire path is enclosed in quotes. For example, C:\My Documents\paper.doc is invalid, whereas "C:\My Documents\paper.doc" is valid.

  • Strings with non-letters are read out character by character.
  • In file paths, the "\" symbol is pronounced as "backslash".
Text Pronunciation
c:\user\documents c colon backslash user backslash documents
\\computer\private\stash backslash backslash computer backslash private backslash stash
\home\user-home backslash home backslash user home

  • Phone numbers are generally organized as <area code> <exchange> <number>, such as 650-555-8355.
  • Brief time breaks occur between number segments.
  • Phone numbers are not pronounced in pairs, as regular numbers are; digits are read individually.
    Text Pronunciation
    +1 (425) 555-8080 "one four two five five five five eight zero eight zero"
    1 800 555 TELL "one eight hundred five hundred fifty five Tell"
    650-555-9000 " six five zero five five five nine zero zero zero zero"
    ext.6572 "extension sixty five seventy two"

  • If a phone number does not include the area code, periods are not valid phone number delimiters (for example, 930.9000 is not valid but 650.930.9000 is valid).
  • Phone number delimiters are not pronounced.
  • An area code in an even multiple of 100 is read as cardinal number. For example, 1-800 is pronounced "one eight hundred."
  • You can use the SSML say-as element to ensure that the TTS engine pronounces a phone number correctly.

Tellme supports both United States and international mailing address formats.

  • Numbers in an address are read as numbers (for details, see the Numbers section)
  • United States addresses are typically in the following format:
    Name of addressee
    Number  Street  (additional string)
    City, State  Zipcode
  • To ensure that the TTS engine pronounces the state abbreviation correctly, be sure to include a zip code. Also, do not include extra spaces after the city name. For example, "Mountain View , CA, 94040" may produce incorrect pronunciation.
  • You can use the SSML say-as element to ensure that the TTS engine pronounces an address correctly.
  • Latin American addresses are typically in the following format:
    Name of addressee
    Street address designator + Cardinal number(,) + (N/No/#)Cardinal number(, Second Unit designator)
    Zip Code + City (& Province) / City + Zip Code (& Province)
Pronunciation Rule Text
Between a street address and a numeric street, a break occurs 5315 NE 22nd St
A break occurs between city/state and the zip code Mountain View, CA 94041

Direction values can be any of the following; strings are not case-sensitive.

Input Pronunciation
ne n.e n.e. ne. northeast northeast
nw n.w n.w. nw. northwest northwest
se s.e s.e. se. southeast southeast
sw s.w s.w. sw. southwest southwest
n n. no. north North
s s. so. south South
e e. east East
w w. west West

Street abbreviations can be any of the following; strings are not case-sensitive.

Input Pronunciation
alley aly(.) alley
annex annx(.) anx(.) annex
arcade arc(.) arcade
avenue ave(.) avnue(.) av(.) avn(.) avenue
bend bend
boulevard blvd(.) boul(.) boulv(.) bv(.) boulevard
bridge brg(.) bridge
brook brk(.) brook
bypass byp(.) bypa(.) bypass
causeway cswy(.) causeway
center ctr(.) cntr(.) center
circle circ(.) circl(.) circle
court ct(.) court
creek crk(.) ck(.) creek
crescent cres(.) crscnt(.) crsnt(.) crsent(.) crescent
divide div(.) dv(.) divide
drive dr(.) dv(.) drv(.) drive
estate est(.) estate
expressway exp(.) expy(.) expr(.) expw(.) expressway
extension ext(.) extension
freeway fwy(.) frwy(.) freeway
gateway gatewy(.) gtwy(.) gtway(.) gateway
highway hiway(.) hwy(.) hiwy(.) highway
junction jct(.) jctn(.) junction
lane la(.) ln(.) lane
mall mall
overpass overpass
park pk(.) prk(.) park
parkway pkway(.) pkwy(.) parkway
place pl(.) place
road rd(.) road
route rte(.) route
square sq(.) sqr(.) sqre(.) square
street st(.) strt(.) str(.) street
throughway trwy(.) throughway
turnpike tpk(.) trnpk(.) turnpike
way wy(.) way

Four digit numbers have some common pronunciation patterns, as listed below. You can also use the SSML say-as element to ensure that the TTS engine pronounces a number digit by digit.

Note. To express multiplication, you must write out the mathematical functions. For example, use "4 times 5" instead of "4*5" or "4X5".

Pattern Pronunciation Rule Example Text Example Pronunciation
4 digit numbers without commas, decimal points read as pairs 2348 twenty three forty eight
4 digit numbers where 2nd pair begins with zero 2nd pair is read as individual digits 2304 twenty three zero four
4 digit numbers that begins with zero Read as individual digits 0234 zero two three four
4 digit number where 2nd pair is 00 read in hundreds 1200 twelve hundred
4 digit number 2001 through 2009 Read as a single number 2008 two thousand eight

Additional pronunciation patterns:

Pattern Pronunciation Rule Text Pronunciation
Multiples Pronounced as an amount of 100's, 1000's, and so forth 4000
four thousand
fifty million
5 or more digits in sequence, unless it's a multiple Digits pronounced individually 12345 one two three four five
Plural number (number with "s" or "'s" after it) Pronounced as plural 4536s forty five thirty sixes
Ordinal numbers, such as 1st, 2nd, 14th, and so forth Pronounced as an ordinal 13th thirteenth

Normal rules of fraction pronunciation apply.

Pronunciation Rule Text Pronunciation
If the numerator is more than 3 digits long, or the denominator is more than 2 digits long, the fraction is pronounced as "number slash number" 123/456 one hundred twenty three slash four hundred fifty six
1234/56 twelve thirty four slash fifty six

Currency values are pronounced, in general, as <number><currency value> AND <number> <currency value>. For example, $432.19 is pronounced as "four hundred thirty two dollars and nineteen cents." You can use the SSML say-as element to ensure that the TTS engine pronounces a currency value correctly.

Pronunciation Rule Text Pronunciation
zero value before or after decimal point, only the non-zero value is read $432.00 four hundred thirty two dollars
$0.19 nineteen cents
Use m or b to indicate million or billion, respectively. Capitalization or spacing does not matter. $432M four hundred thirty two million dollars
$432.19 m four hundred thirty two point one nine million dollars
$432B four hundred thirty two billion dollars
$432.19 b four hundred thirty two point one nine billion dollars
Ranges are pronounced with the currency value last $2 - $4 two to four dollars
$2 - 4m two to four million dollars
Yen values are read with "yen" pronounced last (Yen is the only currency without a name for amounts less than whole numbers) JPY 123.45 one hundred twenty three point forty-five yen
Numbers with more than 2 digits after the decimal point have the decimal values read individually and with just the larger currency name $12.3456 twelve point three four five six dollars
Use currency abbreviations GBP 12.34 Twelve pounds sterling and thirty four pence

The following table lists the currency code abbreviations and the readout for each.

Currency Code Readout Subdivision Subdivision Readout
EUR Euro None cent(s)
GBP Pound(s) sterling p. penny/pence
JPY Japanese yen None none
USD US dollar(s) ¢ cent(s)
ARS Argentine peso(s) ARS 0.00 centavo(s)
CLP Chilean peso(s) CLP 0.00 centavo(s)
COP Colombian peso(s) COP 0.00 centavo(s)
CUP Cuban peso(s) CUP 0.00 centavo(s)
CUC Cuban convertible peso(s) CUC 0.00 centavo(s)
MXN Mexican peso(s) MXN 0.00 centavo(s)
UYU Uruguayan peso(s) UYU 0.00 centésimo(s)
NIO Nicaraguan córdoba(s) NIO 0.00 centavo(s)
DOP Dominican peso(s) DOP 0.00 centavo(s)
CRC Costa Rican colón/colones CRC 0.00 céntimo(s)
SVC El Salvadoran colón/colones (salvadoreño(s)) SVC 0.00 centavo(s)
GTQ Guatemalan quetzal(es) GTQ 0.00 centavo(s)
PAB Panamanian balboa(s) PAB 0.00 centésimo(s)
PYG Paraguayan guaraní(es) PYG 0.00 céntimo(s)
PEN Peruvian nuevo(s) PEN 0.00 céntimo(s)
BOL Bolivian boliviano(s) BOL 0.00 centavo(s)
VEB Venezuelan bolívar(es) VEB 0.00 centavo(s)

This section covers how the TTS engine pronounces date and time text. You can use the SSML say-as element to ensure that the TTS engine pronounces a date or time value correctly.

Note. Roman Numerals in dates are not supported.

Dates can be formatted in multiple ways. Here are examples of the different formats:

  • 10/17/2009
  • 10/17/09
  • 10-17-2009
  • 10-17-09
  • 10.17.2009
  • 10.17.09
  • 17Oct2009
  • 17Oct09
  • 17 July
  • 2009/10/17
  • 10/2009
Pronunciation Rule Text Pronunciation
Days are read as ordinals (1st, 2nd, 12th, and so on) Feb 14 2009 "February 14th two thousand nine"
Decades are plural 1980s nineteen eighties
The 70s the seventies
Dashes indicate duration, TTS pronounces "to" between the numbers Next 6-8 months next six to eight months
Years ending in 00 read as thousand 2/14/00 February 14th 2000

Time can be formatted in different ways. Below are examples of the different formats. In general, time is expressed in 12-hour format, with am and pm to indicate morning or evening.

  • 12:14
  • 12:14:13
  • 12:14 pm
  • 12:14 PST
Pronunciation Rule Text Pronunciation
Seconds are optional 12:14 twelve fourteen
12:14:13 twelve fourteen and thirteen seconds
Morning and evening indicators are optional, can be capitalized or not, with or without periods 12:14 pm twelve fourteen P M
12:14 PM
12:14 p.m.
Time zones are pronounced as acronyms and are optional 2:15 pm PST two fifteen P M P S T
If the hour starts with zero, the zero is silent 01:00 am one o'clock A M
If the minutes start with zero but is not 00, the zero reads as "oh" 1:07 pm one oh seven P M
Time on the hour is read as "o'clock" 10:00 Ten o'clock
If seconds are specified, the time is read with "and XX seconds" 10:00:34 AM Ten o'clock and thirty four seconds A M
Ranges specify the time measurement after the number, and dashes are read as "to" 2-4 pm two to four P M
2-4:30 two to four thirty
0:00 reads as midnight 0:00 midnight
When date and time are together, a pause occurs between the date and time July 17, 2008 12:30 pm July seventeenth two thousand eight twelve thirty P M

Time duration generally is expressed in the following format:

<hour> <minute> "and" <seconds>

The TTS engine pronounces the time the same, regardless if there is a space or not between the number and the time unit. For example, "5m 11 s" is pronounced the same as "5 m 11s".

If seconds are specified, there is an "and" between the minutes and seconds. For example, 10h 5m 11s is pronounced as "ten hours five minutes and eleven seconds".

Also, it is recommended to include at least two time abbreviations in a time duration designation. Otherwise, the TTS engine may not properly distinguish the context. For example, "5 m" could be five miles, five meters, five minutes. "5m 0s" is clearly five minutes.

Pronunciation Rule Text Pronunciation
If a time unit is 00, it is read as zero. 00m 5s zero minutes and five seconds
If a time unit starts with zero, the zero is not pronounced 04h 10m four hours ten minutes
If a time unit is 1, the unit is singular 1h 4m one hour and four minutes
If the time duration modifies a noun, the time is singular A 4 hour movie a four hour movie

The following table lists the abbreviations for measurement methods, and the singular and plural forms of each as it is read. Some measurement abbreviations normally contain superscript text; for most of these, the TTS engine pronounces them correctly. (Superscript is not supported in VoiceXML.)

The TTS engine pronounces the following measurement abbreviations when they are after a number.

abbrev expansion (singular) expansion (plural)
cc cubic centimeter cubic centimeters
cm centimeter centimeters
cm2 square centimeter square centimeters
cm3 cubic centimeter cubic centimeters
g gram grams
gl gallon gallons
Hz hertz hertz
kg kilogram kilograms
kHz kilohertz kilohertz
km kilometer kilometers
km/s kilometer per second kilometers per second
km/h kilometer per hour kilometers per hour
kn knot knots
l liter liters
m meter meters
m2 square meter square meters
m3 cubic meter cubic meters
MHz megahertz megahertz
min minute minutes
t ton tons

Pronunciation Rule Text Pronunciation
Measurements that modify a noun are singular (even though it is grammatically incorrect) A 5 km walk a five kilometers walk
In the case of percentage or fraction numbers, these can be negative, so the "-" dash character is read as "minus" -50% minus fifty per cent
-4/5 minus four fifths
Percents and all measurements can be specified as ranges (with or without white space on either side of the "-") 5-7% five to seven percent
5 - 7%
Mb 10 Mb ten megabytes

  • Not supported: if the measurement abbreviation is distant from the number, the phrase may or may not be read correctly. For example: 2-8 million km.
  • Some infrequently used measurement abbreviations are not supported:
Abbreviation Potential readout
B byte
W watts
V Volts
Kb Kilobytes
Gb Gigabytes

Generally, special characters are determined by the context in which they appear. The special characters below are pronounced as follows when they are not associated with other text (in other words, surrounded by white space).

Symbol / Special Character Pronunciation
# number
$ dollar
~ tilde
` silence
' silence
" silence
? silence
| silence
\ backslash
/ slash
+ plus
- (minus) silence
_ underscore
= equal
),( silence
! silence
* asterisk
% percent
^ silence

To specify a less-than (<), greater-than (>), or ampersand (&) character, you must format them specially, because they are reserved characters in XML:

Less-than < &lt;
Greater-than > &gt;
Ampersand & &amp;

The following table lists each symbol in the DARPA phonetic alphabet and its respective IPA VoiceXML value. To include phonemes in VoiceXML, use the IPA VoiceXML value. The 16-bit Unicode Value column may be used for reference. In the Example column, the symbol's sound is underlined in the example word.

Note. Be sure to not include spaces in-between IPA symbol values. If you do so, the audio will not play correctly.

Darpa Symbol IPA VoiceXML Value 16-bit Unicode Value Example
aa &#x0251; U+0251 Bob
ae &#x00E6; U+00E6 bat
ah &#x028C; U+028C but
ao &#x0254; U+0254 bought
aw a&#x0361;&#x028A; U+0061 U+0361 U+028A down
ax &#x0259; U+0259 about
ay a&#x0361;i U+0061 U+0361 U+0069 bite
b b U+0062 bet
ch t&#x0361;&#x0283; U+0074 U+0361 U+0283 church
d d U+0064 dig
dh &#x00F0; U+00F0 that
dx d U+0064 butter
eh &#x025B; U+025B bet
em &#x0259;m U+0259 U+006D Chatham
en &#x0259;n U+0259 U+006E satin
er &#x025C; U+025C bird
ey e&#x0361;i U+0065 U+0361 U+0069 bait
f f U+0066 fog
g g U+0067 got
hh h U+0068 hot
ih &#x026A; U+026A bit
iy i U+0069 beat
jh d&#x0361;&#x0292; U+0064 U+0361 U+0292 jump
k k U+006B cat
l l U+006C lot
m m U+006D mom
n n U+006E nod
ng &#x014B; U+014B sing
ow o U+006F boat
oy &#x0254;&#x0361;i U+0254 U+0361 U+0069 boy
p p U+0070 pot
q t U+0074 button
r &#x027B; U+027B rat
s s U+0073 sit
sh &#x0283; U+0283 shut
t t U+0074 top
th &#x03B8; U+03B8 thick
uh &#x028A; U+028A book
uw u U+0075 boot
v v U+0076 vat
w w U+0077 won
y j U+006A you
z z U+007A zoo
zh &#x0292; U+0292 measure
0 not supported not supported Unstressed
1 &#x02C8; U+02C8 Primary stress
2 &#x02CC; U+02CC Secondary stress
& not supported not supported Word boundary

The following items are currently unsupported:

  • Slang
  • Infrequent abbreviations
  • Mathematical equations
See Also
Speech Synthesis Markup Element Reference, Using Spanish Text to Speech
[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers