Using Spanish Text to Speech

The Tellme Voice Application Network supports two female Latin American Spanish Mexico voices for Text to Speech (TTS) processing. This article demonstrates how to access this functionality.

To access Spanish TTS functionality, set the name attribute of the voice element to "teresa/hilda" as shown in the following example.

<?xml version="1.0" encoding="iso-8859-1"?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">
  <form>
    <block>
      <prompt>
        <voice name="teresa">
          Desocupado lector: sin juramento me podrás creer 
          que quisiera que este libro, como hijo del entendimiento, 
          fuera el más hermoso, el más gallardo y
          más discreto que pudiera imaginarse.
        </voice>
      </prompt>
      <exit/>
    </block>
  </form>
</vxml>

This functionality can also be used by setting only xml:lang. In that case the default voice will be teresa.

<?xml version="1.0" encoding="iso-8859-1"?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">
  <form>
    <block>
      <prompt>
        <voice xml:lang="es-MX">
          Desocupado lector: sin juramento me podrás creer 
          que quisiera que este libro, como hijo del entendimiento, 
          fuera el más hermoso, el más gallardo y
          más discreto que pudiera imaginarse.
        </voice>
      </prompt>
      <exit/>
    </block>
  </form>
</vxml>

For information about the Speech Synthesis Markup Language (SSML) elements that the TTS engine supports, see the Speech Synthesis Markup Element Reference.

This section covers how phone numbers and mailing addresses should be formatted and how they are read by the TTS engine.

  • Brief time breaks occur between number segments.
  • Phone numbers are not pronounced in pairs, as regular numbers are; digits are read individually.
    Text Pronunciation
    +52 55 1083 7700 "Cinco dos cinco cinco diez ochenta y tres setenta y siete cero cero"
    +52 (55) 1083 7700 "Cinco dos cinco cinco diez ochenta y tres setenta y siete cero cero"
    +52 (44) (2094 5656) "Cinco dos cuatro cuatro veinte noventa y cuatro cincuenta y seis cincuenta y seis"

  • Phone number delimiters are not pronounced.
  • You can use the SSML say-as element to ensure that the TTS engine pronounces a phone number correctly.
  • Numbers in an address are read as numbers (for details, see the Numbers section)
  • Addresses in Mexico are typically in the following format:
    Jaime Lopez  [recipient name]
    8th Straco # 69  [street address]
    46800 Puerto Vallarta, JAL   [postal code + city, province/state abbreviation]
    MEXICO 
    
  • To ensure that the TTS engine pronounces the state abbreviation correctly, be sure to include a zip code. Also, do not include extra spaces after the city name.
  • You can use the SSML say-as element to ensure that the TTS engine pronounces an address correctly.
Pronunciation Rule Text
Between a street address and a numeric street, a break occurs Paseo de la Reforma 500, Cuauhtémoc, Juarez
A break occurs between city/state and the zip code 06600 Ciudad de México, D.F., Mexico

Note. House numbers in the address are read digit by digit

Four digit numbers have some common pronunciation patterns, as listed below. You can also use the SSML say-as element to ensure that the TTS engine pronounces a number digit by digit.

Note. To express multiplication, you must write out the mathematical functions. For example, use "4 times 5" instead of "4*5" or "4X5".

Pattern Example Text Example Pronunciation
4 digit numbers without commas, decimal points 4008 "Cuatro mil ocho"
4 digit numbers with a comma 1,876 "Mil ochocientos setenta y seis"
4 digit number with decimal 1954.06 "Mil novecientos , cincuenta y cuatro punto cero seis"
7 digit number 3000000 "Tres millones"

This section covers how the TTS engine pronounces date and time text. You can use the SSML say-as element to ensure that the TTS engine pronounces a date or time value correctly.

Note. Roman Numerals in dates are not supported.

Dates in Mexico are formatted as dd/mm/yyyy

Text Pronunciation
7/7/1977 "siete de Julio, de mil novecientos setenta y siete"
1984 "Mil novecientos ochenta y cuatro"
El 8 de Enero, de 2014 "El ocho de Enero, de dos mil catorce"

Time can be formatted in different ways. Below are examples of the different formats. In general, time is expressed in 12-hour format, with am and pm to indicate morning or evening. For official purposes 24-hour time notation is used.

  • 12:14
  • 12:14:13
  • 12:14 pm
Text Pronunciation
13:01 "Trece horas, un minuto"
1:00 "Una hora"
El 17 de julio, de 2008 12:30 pm El diecisiete de Julio, del dos mil ocho, doce treinta P M
14:03:04 Catorce horas tres minutos cuatro segundos

While you can use all valid XML character sequences in the range U+0000 to U+FFFF in your VoiceXML documents, character data to be processed by the TTS engine (e.g. text in prompt and audio elements) must be non-control characters in the following Unicode tables:

Specifically, these characters must be in the following ranges:

  • U+0020 to U+007E
  • U+00A0 to U+00FF

These ranges encode all the necessary characters for Spanish text.

The following is a list of known issues related to this language:

  • "Peso" amounts are not read. Expect incorrect currency outputs for USD.
  • Expect to hear "punto" (point) for the decimal point, and the word "dolares" (dollars) at the end of the amount. i.e $23.45 is spoken as "Vientitres punto cuarenta y cinco dolares" (twenty three point forty five dollars)
  • If the currency amount is written as it would be in Latin America, with a comma in place of the decimal point, expect the read back for $23,45 "Vientitres dolares. Cuarenta y cinco"
See Also
Speech Synthesis Markup Element Reference, Unicode Code Charts
[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers