Search code examples
speech-recognitionspeech-to-textn-gramsphinx4language-model

Sphinx 4 corrupted ARPA LM?


I have an ARPA LM generated by kylm, when running SPHINX I get this exception stack trace:

Exception in thread "main" java.lang.RuntimeException: Allocation of search manager resources failed
        at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:242)
        at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:87)
        at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:168)
        at transcribing.Main.main(Main.java:78)
Caused by: java.io.IOException: Corrupt Language Model file:./corpus.arpa at line 2420:Premature EOF
        at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.corrupt(SimpleNGramModel.java:458)
        at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.readLine(SimpleNGramModel.java:404)
        at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.load(SimpleNGramModel.java:307)
        at edu.cmu.sphinx.linguist.language.ngram.SimpleNGramModel.allocate(SimpleNGramModel.java:110)
        at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:342)
        at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:238)
        ... 3 more
Java Result: 1

Here's an excerpt of the ARPA LM:

[n]
3

[smoother]
kylm.model.ngram.smoother.KNSmoother

[closed]
true

[max_length]
1091

[vocab_cutoff]
0

[start_symbol]
<s>

[terminal_symbol]
</s>

[unknown_symbol]
<unk>

\data\
ngram 1=406
ngram 2=768
ngram 3=937
\1-grams: 
-99.0000    <s> -0.3630
...
...

\end\

PS: there is a new line after \end\

The exeption says that SPHINX is encountering an unexpected EOF on the last line (isn't it supposed to encounter an EOF there ??)

Please any help !


Solution

  • It turns out to be a SPHINX 4 bug.

    If the \1-grams: directive (or any other directive actually) contained tailing space[s], SimpleNGramModel failed to parse it ! I just submitted the patch, but you can find it here.