Using Japanese Speech Recognition

The Tellme Voice Application Network supports speech recognition for japanese language. This article demonstrates how to access this functionality.

This example demonstrates how to do recognition of 'hello' in japanese using inline grammar. Comments have been provided in the example vxml.

<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

  <!--  
  Description:
  
  Simple recognition with ja-jp male (Input - konnichiva)
  -->

    <form id="test">
        <field name="f1">
            <!-- Setting acoustic model to ja-jp.dnn -->
            <property name="tellme.acousticmodel" value="ja-jp.dnn"/>
            
            <!-- Ask user to say konnichiva (hello) -->
            <prompt>Say <voice name="ichiro"> こんにちは</voice></prompt>

            <!-- Inline grammar that can recognize konnichiva (hello) -->
            <grammar mode="voice" version="1.0" xml:lang="ja-jp"  root="main" tag-format="semantics/1.0">

                <rule id="main" scope="public">
                    <one-of>
                        <item weight="1.0">
                            こんにちは
                            <tag>out = "hello";</tag>
                        </item>
                    </one-of>
                </rule>
            </grammar>

            <filled>
                <log>f1.utterance: <value expr="f1$.utterance"/></log>
              
                <!-- Check result for recognition is a match with input (konnichiva) -->
                <if cond="f1$.utterance=='こんにちは'">
                
                    <audio><voice name="ichiro"> あなたの銀行へようこそ</voice></audio>
                </if>

            </filled>
        </field>

    </form>

</vxml>




I want to make payment is one of the most commonly used recognition.

This example demonstrates how to do recognition of 'i want to make a payment' in japanese using internal grammar. Comments have been provided in the example vxml.

<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

  <!--  
  Description:
  
  Recognition of I want to make a payment with ja-jp male (Input - Watashi wa shiharai o shitai)
  -->

    <form id="test">
        <field name="f1">
            <!-- Setting acoustic model to ja-jp.dnn -->
            <property name="tellme.acousticmodel" value="ja-jp.dnn"/>
            
            <!-- Ask user to say Watashi wa shiharai o shitai (i want to make a payment) -->
            <prompt>Say<voice name="ichiro">私は支払いをしたい</voice></prompt>

            <!-- Inline grammar that can recognize Watashi wa shiharai o shitai (i want to make a payment) -->
            <grammar mode="voice" version="1.0" xml:lang="ja-jp"  root="main" tag-format="semantics/1.0">

                <rule id="main" scope="public">
                    <one-of>
                       <item weight="1.0">
                            <!-- I want to make a payment -->
                            私は支払いをしたい
                           <tag>out = "Watashi wa shiharai o shitai";</tag>
                       </item>
                    </one-of>
                 </rule>
            </grammar>

            <filled>
                <log>f1.utterance: <value expr="f1$.utterance"/></log>
              
                <!-- Check result for recognition is a match with input (i want to make a payment) -->
                <!-- This shows checking interpretation (instead of utterance) which could be a simple english text -->
                <if cond="f1$.interpretation=='Watashi wa shiharai o shitai'">
                    <audio><voice name="ichiro">支払い指示の実行</voice></audio>
                </if>

            </filled>
        </field>

    </form>

</vxml>




DTMF recognition is still a commonly used way to enter credit card number of last 4 digits of social.

This example demonstrates how to do recognition of last four digits of social using internal DTMF grammar. Comments have been provided in the example vxml.

<?xml version="1.0"?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

<!-- Description:
Get last 4 digits of social through DTMF
-->

<form id="form0">
  <field name="f1">

  <!-- Ask user to input last 4 digits of social -->
  <prompt><voice name="ichiro">キーパッドを使用して、あなたの社会保障番号の最後の4桁を入力してください</voice></prompt>
  
  <grammar version="1.0" type="application/srgs+xml" xml:lang="ja-JP" mode="dtmf" root="test">
      <rule id="test" scope="public">
         <!-- This means there should be 4 DTMF inputs -->
         <item repeat="4">
              <one-of>
                <item>0</item>
                <item>1</item>
                <item>2</item>
                <item>3</item>
                <item>4</item>
                <item>5</item>
                <item>6</item>
                <item>7</item>
                <item>8</item>
                <item>9</item>
              </one-of>
         </item>
     </rule>
  </grammar>

   <filled>
     <!-- Using 1234 as an example press -->
     <if cond="f1=='1234'">
       <audio><voice name="ichiro">ソーシャルの最後の4桁が正常に検証されました</voice></audio>
     </if>
   </filled>

  </field>
  
</form>

</vxml>



One of the most common use case is recognition of currency and number, most commonly used towards credit card payment.

This example demonstrates how to do recognition of a number and currency. Comments have been provided in the example vxml.

<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

<!-- Description:
   Recognize a number and currency
-->

  <form id="test">
    <field name="f1">
        <property name="tellme.acousticmodel" value="ja-jp.dnn"/>
        <property name="confidencelevel" value="0.20" />
        
        <!-- Ask user to say five yen -->
        <prompt>Say<voice name="ichiro">五円</voice></prompt>

        <grammar mode="voice" version="1.0" xml:lang="ja-jp"  root="main" tag-format="semantics/1.0">
          <rule id="main" scope="public">
            <one-of>
               <item weight="1.0">
                 <!-- This is five yen -->
                 五円
                   <tag>out = "five yen";</tag>
               </item>
            </one-of>
         </rule>
      </grammar>


     <filled>
        <!-- Check if recognition matches with speech -->
        <if cond="f1$.utterance=='五円'">
            <audio>Transfering <voice name="ichiro"><value expr="f1$.utterance"/> </voice>to savings account</audio>
        </if>

     </filled>
   </field>
  </form>
</vxml>



This example demonstrates how to do recognition of a boolean i.e. yes and no. Comments have been provided in the example vxml.

<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.1"
  xmlns="http://www.w3.org/2001/vxml">

<!-- Description
    Recognize boolean yes or no
-->

  <form id="test">
    <field name="f1">
       <property name="tellme.acousticmodel" value="ja-jp.dnn"/>
       <property name="confidencelevel" value="0.20" />

       <!-- Ask user to say yes or no -->
       <prompt><voice name="ichiro">はいまたはいいえ</voice></prompt>
       
       <!-- Inline grammar to recognize yes and no -->
       <grammar mode="voice" version="1.0" xml:lang="ja-jp"  root="main" tag-format="semantics/1.0">
            <rule id="main" scope="public">
                <one-of>
                    <item>
                       はい
                       <tag>out = "yes";</tag>
                     </item>
                    <item>
                        いいえ
                        <tag>out = "no";</tag>
                    </item>
                </one-of>
            </rule>
        </grammar>

        <filled>
            <audio><voice name="ichiro">You said<value expr="f1$.utterance"/></voice></audio>
        </filled>
   </field>

  </form>

</vxml>

[24]7 Inc.| Terms of Service| Privacy Policy| General Disclaimers