Using Chinese Speech Recognition

The Tellme Voice Application Network supports speech recognition for chinese language. This article demonstrates how to access this functionality.

This example demonstrates how to do recognition of 'hello' in chinese using inline grammar. Comments have been provided in the example vxml.

<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

  <!--  
  Description:
  
  Simple recognition of Hello
  -->

    <form id="test">
        <field name="f1">
            <!-- Setting acoustic model to ja-jp.dnn -->
            <property name="tellme.acousticmodel" value="zh-cn.dnn"/>
            
            <!-- Ask user to say Ni Hao (hello) -->
            <prompt>Say <voice name="kangkang"> 你好</voice></prompt>

            <!-- Inline grammar that can recognize Ni Hao (hello) -->
            <grammar mode="voice" version="1.0" xml:lang="zh-cn"  root="main" tag-format="semantics/1.0">

                <rule id="main" scope="public">
                    <one-of>
                        <item weight="1.0">
                            你好
                            <tag>out = "hello";</tag>
                        </item>
                    </one-of>
                </rule>
            </grammar>

            <filled>
                <log>f1.utterance: <value expr="f1$.utterance"/></log>
              
                <!-- Check result for recognition is a match with input (Hello) -->
                <if cond="f1$.utterance=='你好'">
                
                    <audio><voice name="kangkang"> 欢迎来到您的银行</voice></audio>
                </if>

            </filled>
        </field>

    </form>

</vxml>




I want to make payment is one of the most commonly used recognition.

This example demonstrates how to do recognition of 'i want to make a payment' in chinese using internal grammar. Comments have been provided in the example vxml.

<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

  <!--  
  Description:
  
  Recognition of I want to make a payment (
  -->

    <form id="test">
        <field name="f1">
            <!-- Setting acoustic model to ja-jp.dnn -->
            <property name="tellme.acousticmodel" value="zh-cn.dnn"/>
            
            <!-- Ask user to say wo xiang fu kuan (i want to make a payment) -->
            <prompt>Say<voice name="kangkang">我想付款</voice></prompt>

            <!-- Inline grammar that can recognize i want to make a payment -->
            <grammar mode="voice" version="1.0" xml:lang="zh-cn"  root="main" tag-format="semantics/1.0">

                <rule id="main" scope="public">
                    <one-of>
                       <item weight="1.0">
                            <!-- I want to make a payment -->
                            我想付款
                           <tag>out = "wo xiang fu kuan";</tag>
                       </item>
                    </one-of>
                 </rule>
            </grammar>

            <filled>
                <log>f1.utterance: <value expr="f1$.utterance"/></log>
              
                <!-- Check result for recognition is a match with input (i want to make a payment) -->
                <!-- This shows checking interpretation (instead of utterance) which could be a simple english text -->
                <if cond="f1$.interpretation=='wo xiang fu kuan'">
                    <audio><voice name="kangkang">执行付款指示</voice></audio>
                </if>

            </filled>
        </field>

    </form>

</vxml>




DTMF recognition is still a commonly used way to enter credit card number of last 4 digits of social.

This example demonstrates how to do recognition of last four digits of social using internal DTMF grammar. Comments have been provided in the example vxml.

<?xml version="1.0"?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

<!-- Description:
Get last 4 digits of social through DTMF
-->

<form id="form0">
  <field name="f1">

  <!-- Ask user to input last 4 digits of social -->
  <prompt><voice name="kangkang">请使用键盘并输入社会保险号码的最后4位数字</voice></prompt>
  
  <grammar version="1.0" type="application/srgs+xml" xml:lang="zh-cn" mode="dtmf" root="test">
      <rule id="test" scope="public">
         <!-- This means there should be 4 DTMF inputs -->
         <item repeat="4">
              <one-of>
                <item>0</item>
                <item>1</item>
                <item>2</item>
                <item>3</item>
                <item>4</item>
                <item>5</item>
                <item>6</item>
                <item>7</item>
                <item>8</item>
                <item>9</item>
              </one-of>
         </item>
     </rule>
  </grammar>

   <filled>
     <!-- Using 1234 as an example press -->
     <if cond="f1=='1234'">
       <audio><voice name="kangkang">社交的最后4位成功验证</voice></audio>
     </if>
   </filled>

  </field>
  
</form>

</vxml>



One of the most common use case is recognition of currency and number, most commonly used towards credit card payment.

This example demonstrates how to do recognition of a number and currency. Comments have been provided in the example vxml.

<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

<!-- Description:
   Recognize a number and currency
-->

  <form id="test">
    <field name="f1">
        <property name="tellme.acousticmodel" value="zh-cn.dnn"/>
        <property name="confidencelevel" value="0.20" />
        
        <!-- Ask user to say five yen -->
        <prompt>Say<voice name="kangkang">五日元</voice></prompt>

        <grammar mode="voice" version="1.0" xml:lang="zh-cn"  root="main" tag-format="semantics/1.0">
          <rule id="main" scope="public">
            <one-of>
               <item weight="1.0">
                 <!-- This is five yen -->
                 五日元
                   <tag>out = "five yen";</tag>
               </item>
            </one-of>
         </rule>
      </grammar>


     <filled>
        <!-- Check if recognition matches with speech -->
        <if cond="f1$.utterance=='五日元'">
            <audio>Transfering <voice name="kangkang"><value expr="f1$.utterance"/> </voice>to savings account</audio>
        </if>

     </filled>
   </field>
  </form>
</vxml>



This example demonstrates how to do recognition of a boolean i.e. yes and no. Comments have been provided in the example vxml.

<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

<!-- Description
    Recognize boolean yes or no
-->

  <form id="test">
    <field name="f1">
       <property name="tellme.acousticmodel" value="zh-cn.dnn"/>
       <property name="confidencelevel" value="0.20" />

       <!-- Ask user to say yes or no -->
       <prompt><voice name="kangkang">是或否</voice></prompt>
       
       <!-- Inline grammar to recognize yes and no -->
       <grammar mode="voice" version="1.0" xml:lang="zh-cn"  root="main" tag-format="semantics/1.0">
            <rule id="main" scope="public">
                <one-of>
                    <item>
                       是
                       <tag>out = "yes";</tag>
                     </item>
                    <item>
                        没有
                        <tag>out = "no";</tag>
                    </item>
                </one-of>
            </rule>
        </grammar>

        <filled>
            <audio><voice name="kangkang">You said<value expr="f1$.utterance"/></voice></audio>
        </filled>
   </field>

  </form>

</vxml>

[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers