TTS Engine Behavior

Text-to-Speech (TTS) allows you to develop applications without relying on pre-recorded audio recordings for all voice prompts. You can also use TTS to quickly make changes or announcements to your voice applications without having to record new audio. If you also use the [24]7 Experience Manager, you can make changes to your voice application without deploying new VoiceXML code.

For example, your call center becomes suddenly unavailable to receive calls. Temporarily, yet quickly, you can provide a new TTS prompt to announce to users the issue and provide them with an alternate number to call. When your call center is available again, you can easily remove the TTS announcement and users hear the original recording. Without TTS, you must record new audio and deploy the new audio file, which is a much lengthier method.

In general, Tellme recommends that you supply TTS for all your audio. TTS also functions as a backup for those inopportune situations when a voice audio file malfunctions.

1. Available Languages and Locales

The current version of the TTS engine supports the following combinations of languages and locales, and the voice names associated with each:

Language Gender Voice Name
US English Female zira
US English Male tom
Latin American Spanish Female teresa, hilda
UK English Female hazel
UK English Male george
Australia English Female hayley
Australia English Male james
Canada English Female heather
Spain Spanish Female helena
France French Female hortense
Canada French Female harmonie, isabelle
Canada French Male claude
Italian Female lucia
German Female hedda
Chinese Male kangkang
Chinese Female yaoyao
Japanese Male ichiro
Japanese Female ayumi
Brazilian Portuguese Male daniel

For example:

To specify the voice name, enter the following within the audio element, as follows:

        <voice name="tom"> 
                Welcome to Tellme. 
        </voice>

After the end of the SSML voice element (</voice>), the TTS engine goes back to the default voice.

2. Supported Values for Voice Tag

The current version of the TTS engine supports the following combinations of languages, voice names and gender.

xml:lang gender name Expected Voice
en-US female zira zira
en-US male tom tom
es-MX female teresa teresa
es-MX female hilda hilda
fr-CA female isabelle isabelle
fr-CA female harmonie harmonie
fr-CA male claude claude
fr-FR female hortense hortense
en-AU female hayley hayley
en-AU male james james
es-ES female helena helena
en-GB female hazel hazel
en-GB male george george
en-CA female heather heather
de-DE female hedda hedda
it-IT female lucia lucia
pt-BR male daniel daniel
zh-CN male kangkang kangkang
zh-CN female yaoyao yaoyao
ja-JP male ichiro ichiro
ja-JP female ayumi ayumi
en-US female - zira
en-US male - tom
es-MX female - teresa
fr-CA female - isabelle
fr-CA male - claude
en-AU female - hayley
en-AU male - james
es-ES female - helena
en-GB female - hazel
en-GB male - george
en-CA female - heather
de-DE female - hedda
it-IT female - lucia
pt-BR female - zira
zh-CN female - zira
ja-JP female - zira
pt-BR - daniel daniel
zh-CN - kangkang kangkang
zh-CN - yaoyao yaoyao
ja-JP - ichiro ichiro
ja-JP - ayumi ayumi
- male daniel daniel
- male kangkang kangkang
- female yaoyao yaoyao
- male yaoyao yaoyao
- female kangkang kangkang
- male ichiro ichiro
- female ayumi ayumi
- male ayumi ayumi
- female ichiro ichiro
- - zira zira
- - tom tom
- - teresa teresa
- - hilda hilda
- - isabelle isabelle
- - harmonie harmonie
- - claude claude
- - hayley hayley
- - james james
- - helena helena
- - hazel hazel
- - george george
- - heather heather
- - hedda hedda
- - lucia lucia
- - daniel daniel
- - kangkang kangkang
- - yaoyao yaoyao
- - ichiro ichiro
- - ayumi ayumi
en-US - - zira
es-MX - - teresa
fr-CA - - isabelle
fr-FR - - hortense
en-AU - - hayley
es-ES - - helena
en-GB - - hazel
en-CA - - heather
de-DE - - hedda
it-IT - - lucia
pt-BR - - zira
zh-CN - - zira
ja-JP - - zira
- female - zira
- male - tom
- - - zira

3. Supported SSML Elements

For information about the Speech Synthesis Markup Language (SSML) elements that the TTS engine supports, see the Speech Synthesis Markup Element Reference.

4. Acronyms

The TTS engine pronounces some acronyms as words, and some are spelled out. Tellme recommends you spell out acronyms to ensure their correct pronunciation, and test your text to speech before delivering your final application. For example, to pronounce SAP as an acronym, space out the letters:

S A P

5. Addresses and Phone Numbers

This section covers how email addresses, web addresses, file paths, phone numbers, and mailing addresses should be formatted and how they are read by the TTS engine.

5.1. Email Addresses

The format of an email address is alias@hostname. If a portion of the alias is recognized as a word, it will be pronounced that way; otherwise, digits are pronounced individually.

Valid email alias separators are ".", "-" and "_".

Text Pronunciation
No-reply@example.com N O dash R E P L Y at example dot com
Paul70@example.com P A U L seven zero at example dot com
mailto://webmaster@example.com mailto colon slash slash webmaster at example dot com

5.2. Web Addresses

The format of a web addresses is either http://hostname.example.com or ftp://hostname.example.com. If the URL is word-based, the words are pronounced. Numbers in addresses are read individually.

Text Pronunciation
www.cnn.com W W W dot C N N dot com
http://www.tellme.com H T T P W W W dot tell me dot com
http://canada.gc.ca/home.html H T T P Canada dot G C dot C A slash home dot H T M L

5.3. File Paths

File paths are generally formatted in one of the following formats:

Spaces are valid only if the entire path is enclosed in quotes. For example, C:\My Documents\paper.doc is invalid, whereas "C:\My Documents\paper.doc" is valid.

Text Pronunciation
c:\user\documents c colon backslash user backslash documents
\\computer\private\stash backslash backslash computer backslash private backslash stash
\home\user-home backslash home backslash user home

5.4. Phone Numbers
5.5. Mailing/Physical Addresses

Tellme supports both United States and international mailing address formats.

Pronunciation Rule Text
Between a street address and a numeric street, a break occurs 5315 NE 22nd St
A break occurs between city/state and the zip code Mountain View, CA 94041

Direction values can be any of the following; strings are not case-sensitive.

Input Pronunciation
ne n.e n.e. ne. northeast northeast
nw n.w n.w. nw. northwest northwest
se s.e s.e. se. southeast southeast
sw s.w s.w. sw. southwest southwest
n n. no. north North
s s. so. south South
e e. east East
w w. west West

Street abbreviations can be any of the following; strings are not case-sensitive.

Input Pronunciation
alley aly(.) alley
annex annx(.) anx(.) annex
arcade arc(.) arcade
avenue ave(.) avnue(.) av(.) avn(.) avenue
bend bend
boulevard blvd(.) boul(.) boulv(.) bv(.) boulevard
bridge brg(.) bridge
brook brk(.) brook
bypass byp(.) bypa(.) bypass
causeway cswy(.) causeway
center ctr(.) cntr(.) center
circle circ(.) circl(.) circle
court ct(.) court
creek crk(.) ck(.) creek
crescent cres(.) crscnt(.) crsnt(.) crsent(.) crescent
divide div(.) dv(.) divide
drive dr(.) dv(.) drv(.) drive
estate est(.) estate
expressway exp(.) expy(.) expr(.) expw(.) expressway
extension ext(.) extension
freeway fwy(.) frwy(.) freeway
gateway gatewy(.) gtwy(.) gtway(.) gateway
highway hiway(.) hwy(.) hiwy(.) highway
junction jct(.) jctn(.) junction
lane la(.) ln(.) lane
mall mall
overpass overpass
park pk(.) prk(.) park
parkway pkway(.) pkwy(.) parkway
place pl(.) place
road rd(.) road
route rte(.) route
square sq(.) sqr(.) sqre(.) square
street st(.) strt(.) str(.) street
throughway trwy(.) throughway
turnpike tpk(.) trnpk(.) turnpike
way wy(.) way

6. Numbers

Four digit numbers have some common pronunciation patterns, as listed below. You can also use the SSML say-as element to ensure that the TTS engine pronounces a number digit by digit.

Note. To express multiplication, you must write out the mathematical functions. For example, use "4 times 5" instead of "4*5" or "4X5".

Pattern Pronunciation Rule Example Text Example Pronunciation
4 digit numbers without commas, decimal points read as pairs 2348 twenty three forty eight
4 digit numbers where 2nd pair begins with zero 2nd pair is read as individual digits 2304 twenty three zero four
4 digit numbers that begins with zero Read as individual digits 0234 zero two three four
4 digit number where 2nd pair is 00 read in hundreds 1200 twelve hundred
4 digit number 2001 through 2009 Read as a single number 2008 two thousand eight

Additional pronunciation patterns:

Pattern Pronunciation Rule Text Pronunciation
Multiples Pronounced as an amount of 100's, 1000's, and so forth 4000
50,000,000
four thousand
fifty million
5 or more digits in sequence, unless it's a multiple Digits pronounced individually 12345 one two three four five
Plural number (number with "s" or "'s" after it) Pronounced as plural 4536s forty five thirty sixes
Ordinal numbers, such as 1st, 2nd, 14th, and so forth Pronounced as an ordinal 13th thirteenth

6.1. Fractions

Normal rules of fraction pronunciation apply.

Pronunciation Rule Text Pronunciation
If the numerator is more than 3 digits long, or the denominator is more than 2 digits long, the fraction is pronounced as "number slash number" 123/456 one hundred twenty three slash four hundred fifty six
1234/56 twelve thirty four slash fifty six

6.2. Currency

Currency values are pronounced, in general, as <number><currency value> AND <number> <currency value>. For example, $432.19 is pronounced as "four hundred thirty two dollars and nineteen cents." You can use the SSML say-as element to ensure that the TTS engine pronounces a currency value correctly.

Pronunciation Rule Text Pronunciation
zero value before or after decimal point, only the non-zero value is read $432.00 four hundred thirty two dollars
$0.19 nineteen cents
Use m or b to indicate million or billion, respectively. Capitalization or spacing does not matter. $432M four hundred thirty two million dollars
$432.19 m four hundred thirty two point one nine million dollars
$432B four hundred thirty two billion dollars
$432.19 b four hundred thirty two point one nine billion dollars
Ranges are pronounced with the currency value last $2 - $4 two to four dollars
$2 - 4m two to four million dollars
Yen values are read with "yen" pronounced last (Yen is the only currency without a name for amounts less than whole numbers) JPY 123.45 one hundred twenty three point forty-five yen
Numbers with more than 2 digits after the decimal point have the decimal values read individually and with just the larger currency name $12.3456 twelve point three four five six dollars
Use currency abbreviations GBP 12.34 Twelve pounds sterling and thirty four pence

The following table lists the currency code abbreviations and the readout for each.

Currency Code Readout Subdivision Subdivision Readout
EUR Euro None cent(s)
GBP Pound(s) sterling p. penny/pence
JPY Japanese yen None none
USD US dollar(s) cent cent(s)
ARS Argentine peso(s) ARS 0.00 centavo(s)
CLP Chilean peso(s) CLP 0.00 centavo(s)
COP Colombian peso(s) COP 0.00 centavo(s)
CUP Cuban peso(s) CUP 0.00 centavo(s)
CUC Cuban convertible peso(s) CUC 0.00 centavo(s)
MXN Mexican peso(s) MXN 0.00 centavo(s)
UYU Uruguayan peso(s) UYU 0.00 centesimo(s)
NIO Nicaraguan cordoba(s) NIO 0.00 centavo(s)
DOP Dominican peso(s) DOP 0.00 centavo(s)
CRC Costa Rican colon/colones CRC 0.00 centimo(s)
SVC El Salvadoran colon/colones (salvadoreno(s)) SVC 0.00 centavo(s)
GTQ Guatemalan quetzal(es) GTQ 0.00 centavo(s)
PAB Panamanian balboa(s) PAB 0.00 centesimo(s)
PYG Paraguayan guarani(es) PYG 0.00 centimo(s)
PEN Peruvian nuevo(s) PEN 0.00 centimo(s)
BOL Bolivian boliviano(s) BOL 0.00 centavo(s)
VEB Venezuelan bolivar(es) VEB 0.00 centavo(s)

7. Dates and Times

This section covers how the TTS engine pronounces date and time text. You can use the SSML say-as element to ensure that the TTS engine pronounces a date or time value correctly.

Note. Roman Numerals in dates are not supported.

7.1. Dates

Dates can be formatted in multiple ways. Here are examples of the different formats:

Pronunciation Rule Text Pronunciation
Days are read as ordinals (1st, 2nd, 12th, and so on) Feb 14 2009 "February 14th two thousand nine"
Decades are plural 1980s nineteen eighties
The 70s the seventies
Dashes indicate duration, TTS pronounces "to" between the numbers Next 6-8 months next six to eight months
Years ending in 00 read as thousand 2/14/00 February 14th 2000

7.2. Time

Time can be formatted in different ways. Below are examples of the different formats. In general, time is expressed in 12-hour format, with am and pm to indicate morning or evening.

Pronunciation Rule Text Pronunciation
Seconds are optional 12:14 twelve fourteen
12:14:13 twelve fourteen and thirteen seconds
Morning and evening indicators are optional, can be capitalized or not, with or without periods 12:14 pm twelve fourteen P M
12:14 PM
12:14 p.m.
Time zones are pronounced as acronyms and are optional 2:15 pm PST two fifteen P M P S T
If the hour starts with zero, the zero is silent 01:00 am one o'clock A M
If the minutes start with zero but is not 00, the zero reads as "oh" 1:07 pm one oh seven P M
Time on the hour is read as "o'clock" 10:00 Ten o'clock
If seconds are specified, the time is read with "and XX seconds" 10:00:34 AM Ten o'clock and thirty four seconds A M
Ranges specify the time measurement after the number, and dashes are read as "to" 2-4 pm two to four P M
2-4:30 two to four thirty
0:00 reads as midnight 0:00 midnight
When date and time are together, a pause occurs between the date and time July 17, 2008 12:30 pm July seventeenth two thousand eight twelve thirty P M

7.3. Time Duration

Time duration generally is expressed in the following format:

<hour> <minute> "and" <seconds>

The TTS engine pronounces the time the same, regardless if there is a space or not between the number and the time unit. For example, "5m 11 s" is pronounced the same as "5 m 11s".

If seconds are specified, there is an "and" between the minutes and seconds. For example, 10h 5m 11s is pronounced as "ten hours five minutes and eleven seconds".

Also, it is recommended to include at least two time abbreviations in a time duration designation. Otherwise, the TTS engine may not properly distinguish the context. For example, "5 m" could be five miles, five meters, five minutes. "5m 0s" is clearly five minutes.

Pronunciation Rule Text Pronunciation
If a time unit is 00, it is read as zero. 00m 5s zero minutes and five seconds
If a time unit starts with zero, the zero is not pronounced 04h 10m four hours ten minutes
If a time unit is 1, the unit is singular 1h 4m one hour and four minutes
If the time duration modifies a noun, the time is singular A 4 hour movie a four hour movie

8. Measurements

The following table lists the abbreviations for measurement methods, and the singular and plural forms of each as it is read. Some measurement abbreviations normally contain superscript text; for most of these, the TTS engine pronounces them correctly. (Superscript is not supported in VoiceXML.)

The TTS engine pronounces the following measurement abbreviations when they are after a number.

abbrev expansion (singular) expansion (plural)
cc cubic centimeter cubic centimeters
cm centimeter centimeters
cm2 square centimeter square centimeters
cm3 cubic centimeter cubic centimeters
g gram grams
gl gallon gallons
Hz hertz hertz
kg kilogram kilograms
kHz kilohertz kilohertz
km kilometer kilometers
km/s kilometer per second kilometers per second
km/h kilometer per hour kilometers per hour
kn knot knots
l liter liters
m meter meters
m2 square meter square meters
m3 cubic meter cubic meters
MHz megahertz megahertz
min minute minutes
t ton tons

Pronunciation Rule Text Pronunciation
Measurements that modify a noun are singular (even though it is grammatically incorrect) A 5 km walk a five kilometers walk
In the case of percentage or fraction numbers, these can be negative, so the "-" dash character is read as "minus" -50% minus fifty per cent
-4/5 minus four fifths
Percents and all measurements can be specified as ranges (with or without white space on either side of the "-") 5-7% five to seven percent
5 - 7%
Mb 10 Mb ten megabytes

Abbreviation Potential readout
B byte
W watts
V Volts
Kb Kilobytes
Gb Gigabytes

9. Special Characters

Generally, special characters are determined by the context in which they appear. The special characters below are pronounced as follows when they are not associated with other text (in other words, surrounded by white space).

Symbol / Special Character Pronunciation
# number
$ dollar
~ tilde
` silence
' silence
" silence
? silence
| silence
\ backslash
/ slash
+ plus
- (minus) silence
_ underscore
= equal
),( silence
! silence
* asterisk
% percent
^ silence

To specify a less-than (<), greater-than (>), or ampersand (&) character, you must format them specially, because they are reserved characters in XML:

Less-than < &lt;
Greater-than > &gt;
Ampersand & &amp;

10. Phonemes

The following table lists each symbol in the DARPA phonetic alphabet and its respective IPA VoiceXML value. To include phonemes in VoiceXML, use the IPA VoiceXML value. The 16-bit Unicode Value column may be used for reference. In the Example column, the symbol's sound is underlined in the example word.

Note. Be sure to not include spaces in-between IPA symbol values. If you do so, the audio will not play correctly.

Darpa Symbol IPA VoiceXML Value 16-bit Unicode Value Example
aa &#x0251; U+0251 Bob
ae &#x00E6; U+00E6 bat
ah &#x028C; U+028C but
ao &#x0254; U+0254 bought
aw a&#x0361;&#x028A; U+0061 U+0361 U+028A down
ax &#x0259; U+0259 about
ay a&#x0361;i U+0061 U+0361 U+0069 bite
b b U+0062 bet
ch t&#x0361;&#x0283; U+0074 U+0361 U+0283 church
d d U+0064 dig
dh &#x00F0; U+00F0 that
dx d U+0064 butter
eh &#x025B; U+025B bet
em &#x0259;m U+0259 U+006D Chatham
en &#x0259;n U+0259 U+006E satin
er &#x025C; U+025C bird
ey e&#x0361;i U+0065 U+0361 U+0069 bait
f f U+0066 fog
g g U+0067 got
hh h U+0068 hot
ih &#x026A; U+026A bit
iy i U+0069 beat
jh d&#x0361;&#x0292; U+0064 U+0361 U+0292 jump
k k U+006B cat
l l U+006C lot
m m U+006D mom
n n U+006E nod
ng &#x014B; U+014B sing
ow o U+006F boat
oy &#x0254;&#x0361;i U+0254 U+0361 U+0069 boy
p p U+0070 pot
q t U+0074 button
r &#x027B; U+027B rat
s s U+0073 sit
sh &#x0283; U+0283 shut
t t U+0074 top
th &#x03B8; U+03B8 thick
uh &#x028A; U+028A book
uw u U+0075 boot
v v U+0076 vat
w w U+0077 won
y j U+006A you
z z U+007A zoo
zh &#x0292; U+0292 measure
0 not supported not supported Unstressed
1 &#x02C8; U+02C8 Primary stress
2 &#x02CC; U+02CC Secondary stress
& not supported not supported Word boundary

11. Unsupported Text

The following items are currently unsupported:

See Also
Speech Synthesis Markup Element Reference, Using Spanish Text to Speech
[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers