Using Japanese Text to Speech

The Tellme Voice Application Network supports a female and a male Japanese voice for Japanese Text to Speech (TTS) processing. This article demonstrates how to access this functionality..

There are three japanese voices, one male and two female:

  • Ichiro - Male (Goes to microsoft cloud)
  • Ayumi - Female (Goes to microsoft cloud)
  • Haruka - Female (Internal voice)
Following sections shows different examples covering all the three voices.

To access Japanese TTS functionality, set the name attribute of the voice element to either "ichiro" (male) or "ayumi" (female) as shown in the following example.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1"
        <voice name="ichiro">
          <!-- Hi, welcome to your bank. What would you like to do today? -->

<?xml version="1.0" encoding="iso-8859-1"?>
<vxml version="2.1"
        <voice name="haruka">
          <!-- Would you like to check your balance? -->

For information about the Speech Synthesis Markup Language (SSML) elements that the TTS engine supports, see the Speech Synthesis Markup Element Reference.

This section covers how phone numbers and mailing addresses should be formatted and how they are read by the TTS engine.

  • Brief time breaks occur between number segments.
  • Phone numbers are not pronounced in pairs, as regular numbers are; digits are read individually.
  • Phone number delimiters are not pronounced. A comma indicates a short pause.
  • You can use the SSML say-as element to ensure that the TTS engine pronounces a phone number correctly.
Text Pronunciation
03-2323-9801 "zero san no ni san ni san no kyuu hachi zero ichi"
011-323-9801 "zero ichi ichi no san ni san no kyuu hachi zero ichi"
0120-323-980 "zero ichi ni zero no san ni san no kyuu hachi zero"

  • Numbers in an address are read as numbers (for details, see the Numbers section)
  • Japanese addresses are typically in the following format:
     YUUBINBANGOU (Zip/Postal Code)
                        / \
       ,---------------'   '-----------,
      /      |                         |
     TO    FU (Metropolis)     KEN or DO (Prefecture)
       \              \          /     \
        \              \        /       \
         \              \      /         \
          \----------------------------, |
           \              \  /          \|
            \              \/            V
             \          SHI (City)    GUN (Rural area)
              \             /|           |
               \           / |           |
                \         /  |          /|
                 \       /   |_________/ |
                  \     /    /           |
               KU (Ward)    /            |
                    \      /             |
                     \    /              |
                      \  /               |
                       \/                |
                CHOU=MACHI (Town)     MURA (Village)
                         \               /
                          \             /
                   *CHOUME (District)  /
                                   \  /
                                   /  \
                                  /    \
                     *BANCHI (Block)  *BAN (Block)
                                 |      |
                                 |      |
                                  \   *GO (Building)
                                   \   /
                                    \ /
                               Building name
                           *GO (Room/Apt #, etc.)
  • To ensure that the TTS engine pronounces the state abbreviation correctly, be sure to include a zip code. Also, do not include extra spaces after the city name.
  • You can use the SSML say-as element to ensure that the TTS engine pronounces an address correctly.
Pronunciation Rule Text
321-0968栃木県宇都宮市中京泉5丁目37-16 sanbyaku nijyuuichi reirei kyuu roku hachi tochigi-ken utsunomiya-shi chuu-kyou izumi go-chome san jyuu nana jyuuroku
158-0083 東京都世田谷区奥沢1丁目41−12 hyakugohachi reirei reirei hachi san tokyou-to setagaya-ku okusawa ichi-chome yonjyuuichi jyuuni

Four digit numbers have some common pronunciation patterns, as listed below. You can also use the SSML say-as element to ensure that the TTS engine pronounces a number digit by digit.

Pattern Pronunciation Rule Example Text Example Pronunciation
A simple four digit number Read as a single number 2351 "ni san go ichi"

Currency values are pronounced, in general, as <number><currency value> AND <number> <currency value>. You can use the SSML say-as element to ensure that the TTS engine pronounces a currency value correctly.

Text Pronunciation
¥1,235 "sen nihyaku sanjyuugo en"
¥2,000,000,000 "nijyuu oku en"

You can refer here Currency Abbreviations to know about currency code abbreviations and the readout for each.

This section covers how the TTS engine pronounces date and time text. You can use the SSML say-as element to ensure that the TTS engine pronounces a date or time value correctly.

Note. Roman Numerals in dates are not supported.

Dates in Japan are formatted as year month day (weekday).

Text Pronunciation
20/12/31 "nisennijyuu-nen jyuuni-gatsu sanjyuuichi-nichi"
08年12月1日 "zero hachi nen jyuuni-gatsu tsuitachi"
08.12.31 "ni sen hachi nen jyuuni-gatsu sanjyuuichi-nichi"

While you can use all valid XML character sequences in the range U+0000 to U+FFFF in your VoiceXML documents, character data to be processed by the TTS engine (e.g. text in prompt and audio elements) must be non-control characters in the following Unicode tables:

The following is a list of known issues related to this language:

  • For say-as type "time" use time format 11h02 for good results. Expect incorrect outputs for other formats.
See Also
Speech Synthesis Markup Element Reference, Unicode Code Charts
[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers