I am developing an ASR using PocketSphinx and I have followed every step on this page. When I run pocketsphinx_continous
I get the following error:
ERROR: "ngram_search.c", line 221: Language model/set does not contain </s>, recognition will fail
My language model contains the and the /s tag though.
My language model is as follows:
This is an ARPA-format language model file, generated by CMU Sphinx
\data\
ngram 1=3
ngram 2=1
ngram 3=1
\1-grams:
-0.4770 <s>Alif</s> -0.3010
-0.4770 <s>Baa</s> 0.0000
-0.4770 <s>Jeem</s> 0.0000
\2-grams:
-0.1761 <s>Alif</s> <s>Baa</s> -0.1249
\3-grams:
-0.3010 <s>Alif</s> <s>Baa</s> <s>Jeem</s>
\end\
The corpus file from which this was made is:
<s> Alif </s>
<s> Baa </s>
<s> Jeem </s>
Assistance in resolving this issue is highly appreciated.
When you prepared the corpus you didn't have spaces between <s>
and Alif and thus lm training counted <s>Alif</s>
as a single word. You should have spaces there and proper language model should look like this:
\data\
ngram 1=5
ngram 2=6
ngram 3=0
\1-grams:
-0.3010 </s> 0.0000
-99.0000 <s> -7.3814
-0.7782 Alif -99.0000
-0.7782 Baa -99.0000
-0.7782 Jeem -99.0000
\2-grams:
-0.4771 <s> Alif 0.0000
-0.4771 <s> Baa 0.0000
-0.4771 <s> Jeem 0.0000
0.0000 Alif </s> 0.0000
0.0000 Baa </s> 0.0000
0.0000 Jeem </s> 0.0000
\3-grams:
\end\
This correct LM has separate entry for </s>