Search code examples
cmusphinxpocketsphinx-android

Language model/set does not contain </s>


I am developing an ASR using PocketSphinx and I have followed every step on this page. When I run pocketsphinx_continousI get the following error:

ERROR: "ngram_search.c", line 221: Language model/set does not contain </s>, recognition will fail

My language model contains the and the /s tag though.

My language model is as follows:

This is an ARPA-format language model file, generated by CMU Sphinx
\data\
ngram 1=3
ngram 2=1
ngram 3=1

\1-grams:
-0.4770 <s>Alif</s> -0.3010
-0.4770 <s>Baa</s> 0.0000
-0.4770 <s>Jeem</s> 0.0000

\2-grams:
-0.1761 <s>Alif</s> <s>Baa</s> -0.1249

\3-grams:
-0.3010 <s>Alif</s> <s>Baa</s> <s>Jeem</s> 

\end\

The corpus file from which this was made is:

<s> Alif </s>
<s> Baa </s>
<s> Jeem </s>

Assistance in resolving this issue is highly appreciated.


Solution

  • When you prepared the corpus you didn't have spaces between <s> and Alif and thus lm training counted <s>Alif</s> as a single word. You should have spaces there and proper language model should look like this:

    \data\
    ngram 1=5
    ngram 2=6
    ngram 3=0
    
    
    \1-grams:
    -0.3010 </s> 0.0000
    -99.0000 <s> -7.3814
    -0.7782 Alif -99.0000
    -0.7782 Baa -99.0000
    -0.7782 Jeem -99.0000
    
    \2-grams:
    -0.4771 <s> Alif 0.0000
    -0.4771 <s> Baa 0.0000
    -0.4771 <s> Jeem 0.0000
    0.0000 Alif </s> 0.0000
    0.0000 Baa </s> 0.0000
    0.0000 Jeem </s> 0.0000
    
    \3-grams:
    
    \end\
    

    This correct LM has separate entry for </s>