Using Spanish Text to Speech

The Tellme Voice Application Network supports a female Spanish Spain voice for Text to Speech (TTS) processing. This article demonstrates how to access this functionality.

To access Spanish TTS functionality, set the name attribute of the voice element to "helena" as shown in the following example.

<?xml version="1.0" encoding="iso-8859-1"?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">
  <form>
    <block>
      <prompt>
        <voice name="helena">
          Desocupado lector: sin juramento me podrás creer 
          que quisiera que este libro, como hijo del entendimiento, 
          fuera el más hermoso, el más gallardo y
          más discreto que pudiera imaginarse.
        </voice>
      </prompt>
      <exit/>
    </block>
  </form>
</vxml>

For information about the Speech Synthesis Markup Language (SSML) elements that the TTS engine supports, see the Speech Synthesis Markup Element Reference.

This section covers how phone numbers and mailing addresses should be formatted and how they are read by the TTS engine.

  • Brief time breaks occur between number segments.
  • Phone numbers are not pronounced in pairs, as regular numbers are; digits are read individually.
    Text Pronunciation
    +52 55 1083 7700 "Mas quinientos veinticinco, cincuenta y uno, cero, ocho, tres, siete siete cero cero."
    +52 (55) 1083 7700 "Mas quinientos veinticinco, cincuenta y uno, cero, ocho, tres, siete siete cero cero."
    +52 (44) (2094 5656) "Cinco, dos, cuatro, cuatro, doscientos nueve, cuarenta y cinco, sesenta y cinco, seis"

  • Phone number delimiters are not pronounced.
  • You can use the SSML say-as element to ensure that the TTS engine pronounces a phone number correctly.
  • Numbers in an address are read as numbers (for details, see the Numbers section)
  • United States addresses are typically in the following format:
    Iñigo Montoya (recipient)
    Calle Aduana, 29 [street name + house/buildingnumber]
    28070 MADRID [postal code + city/town/locality]
    
  • To ensure that the TTS engine pronounces the state abbreviation correctly, be sure to include a zip code. Also, do not include extra spaces after the city name.
  • You can use the SSML say-as element to ensure that the TTS engine pronounces an address correctly.
Pronunciation Rule Text
Between a street address and a numeric street, a break occurs Paseo de la Reforma 500, Cuauhtémoc, Juarez
A break occurs between city/state and the zip code 06600 Ciudad de México, D.F., Mexico

Note. House numbers in the address are read digit by digit

  • Four digit numbers have some common pronunciation patterns, as listed below. You can also use the SSML say-as element to ensure that the TTS engine pronounces a number digit by digit.
  • Decimal points are represented as commas, whereas the separator for thousands is a full-stop.

Note. To express multiplication, you must write out the mathematical functions. For example, use "4 times 5" instead of "4*5" or "4X5".

Pattern Pronunciation Rule Example Text Example Pronunciation
4 digit numbers without commas, decimal points read as pairs 2348 "dos mil trescientos cuarenta y ocho"
4 digit numbers where 2nd pair begins with zero 2nd pair is read as individual digits 2304 "dos mil trescientos cuatro"
4 digit numbers that begins with zero Read as pair 0234 "cero dos cientos treinta y cuatro"
4 digit number where 2nd pair is 00 read in hundreds 1200 "mil dos cientos"
4 digit number 2001 through 2009 Read as a single number 2008 "dos mil ocho"

This section covers how the TTS engine pronounces date and time text. You can use the SSML say-as element to ensure that the TTS engine pronounces a date or time value correctly.

Note. Roman Numerals in dates are not supported.

Dates in Mexico are formatted as dd/mm/yyyy.

Text Pronunciation
7/7/1977 "siete de Julio, de mil novecientos setenta y siete"
1984 "Mil novecientos ochenta y cuatro"
El 8 de Enero, de 2014 "El ocho de Enero, de dos mil catorce"

Time can be formatted in different ways. Below are examples of the different formats. In general, time is expressed in 12-hour format, with am and pm to indicate morning or evening. For official purposes 24-hour time notation is used.

  • 12:14
  • 12:14:13
  • 12:14 pm
Text Pronunciation
0:00 "Doce en punto"
El 17 de Julio, de 2008 12:30 pm "El diecisiete de Julio, de dos mil ocho, doce y media P M"
14:03:04 "Catorce cero tres y cuatro segundos"

While you can use all valid XML character sequences in the range U+0000 to U+FFFF in your VoiceXML documents, character data to be processed by the TTS engine (e.g. text in prompt and audio elements) must be non-control characters in the following Unicode tables:

Specifically, these characters must be in the following ranges:

  • U+0020 to U+007E
  • U+00A0 to U+00FF

These ranges encode all the necessary characters for Spanish text.

The following is a list of known issues related to this language:

  • Expect incorrect output for say-as type "currency" because it turns any value after a decimal into single digits
  • Expect to hear the word "comma" and currency "euros" at incorrect location.
  • Euros and USD currencies are not read correctly.
  • Currency $ is read out as "dolars" as opposed to the spanish word for dollars which is "dolares".
See Also
Speech Synthesis Markup Element Reference, Unicode Code Charts
[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers