The Tellme Voice Application Network supports a female and a male Chinese voice for Chinese Text to Speech (TTS) processing. This article demonstrates how to access this functionality.
There are three chinese voices, one male and two female:
- Kangkang - Male (Goes to microsoft cloud)
- Yaoyao - Female (Goes to microsoft cloud)
- Huihui - Female (Internal voice)
To access Chinese TTS functionality, set the name attribute of the voice element to either "kangkang" (male) or "yaoyao" (female) as shown in the following example.
<?xml version="1.0" encoding="iso-8859-1"?> <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml"> <form> <block> <prompt> <voice name="kangkang" xml:lang="zh-CN">今天天气怎样</voice> <!-- how is the weather today? --> </voice> </prompt> <exit/> </block> </form> </vxml>
<?xml version="1.0" encoding="iso-8859-1"?> <vxml version="2.1" xmlns="http://www.w3.org/2001/vxml"> <form> <block> <prompt> <voice name="huihui" xml:lang="zh-CN">今天天气怎样</voice> <!-- how is the weather today? --> </prompt> <exit/> </block> </form> </vxml>
For information about the Speech Synthesis Markup Language (SSML) elements that the TTS engine supports, see the Speech Synthesis Markup Element Reference.
This section covers how phone numbers and mailing addresses should be formatted and how they are read by the TTS engine.
- Brief time breaks occur between number segments.
- Phone numbers are not pronounced in pairs, as regular numbers are; digits are read individually.
- Phone number delimiters are not pronounced. A comma indicates a short pause.
- You can use the SSML say-as element to ensure that the TTS engine pronounces a phone number correctly.
|011 – 0086 – 10 – 67160201||"líng yī yī líng líng bā liù yī líng liù qī yī liù líng èr líng yī"|
|0– 10 – 67160201||"líng yī líng liù qī yī liù líng èr líng yī"|
- Numbers in an address are read as numbers (for details, see the Numbers section)
- Chinese addresses are typically in the following format:
Country, Postal Code Province, City, District, Street Name or Road Name with Street Number or Road Number, Building Name or Number, Room Number Recipient
- To ensure that the TTS engine pronounces the state abbreviation correctly, be sure to include a zip code. Also, do not include extra spaces after the city name.
- You can use the SSML say-as element to ensure that the TTS engine pronounces an address correctly.
|台湾省 台北市 信义路 5段 7号（台北101）||tái wān shěng tái běi shì xìn yì lù wǔ duàn qī hào tái běi yī líng yī|
|香港 东路 六号，5号楼，8号室||xiāng gǎng dōng lù liù hào， wǔ hào lóu ， bā hào shì|
Four digit numbers have some common pronunciation patterns, as listed below. You can also use the SSML say-as element to ensure that the TTS engine pronounces a number digit by digit.
|Pattern||Pronunciation Rule||Example Text||Example Pronunciation|
|A simple 10 digit number||Read as a single number||1234567890||"yī èr sān sì wǔ liù qī bā jiǔ líng"|
Currency values are pronounced, in general, as <number><currency value> AND <number> <currency value>. You can use the SSML say-as element to ensure that the TTS engine pronounces a currency value correctly.
|¥1,235||"yì qiān èr bǎi sān shí wǔ yuán"|
|¥2,000,000,000||"èr shí yì yuán"|
You can refer here Currency Abbreviations to know about currency code abbreviations and the readout for each.
This section covers how the TTS engine pronounces date and time text. You can use the SSML say-as element to ensure that the TTS engine pronounces a date or time value correctly.
Note. Roman Numerals in dates are not supported.
Dates in China are formatted as year month day .
|2017/09/01||"èr líng yī qī nián jiǔ yuè yī rì"|
|2017.09.01||"èr líng yī qī nián jiǔ yuè yī rì"|
|08.12.31||"ni sen hachi nen jyuuni-gatsu sanjyuuichi-nichi"|
While you can use all valid XML character sequences in the range U+0000 to U+FFFF in your VoiceXML documents, character data to be processed by the TTS engine (e.g. text in prompt and audio elements) must be non-control characters in the following Unicode tables:
The following is a list of known issues related to this language:
- For say-as type "time" use time format 11h02 for good results. Expect incorrect outputs for other formats.
|Speech Synthesis Markup Element Reference, Unicode Code Charts|