Search code examples
speech-recognitionasteriskcmusphinx

How to reduce time of speech recognition in CMU Sphinx?


I want append speech recognition to asterisk server. I want try offline solution based on CMU Sphinx. But it work very slow. Reocgnition of simple dict(yes|no|normal) take about 20 seconds. I use this command:

pocketsphinx_continuous \
    -samprate 8000 \
    -dict my.dic \
    -lm ru.lm \
    -hmm zero_ru.cd_cont_4000 \
    -maxhmmpf 3000\
    -maxwpf 5\
    -topn 2\
    -ds 2\
    -logfn log.log \
    -remove_noise no \
    -infile 1.wav

Is it possible reduce time to 1-2 seconds or i must see to online solution(Google, Yandex etc)


Solution

  • You have a number of mistakes in your attempt:

    • You try continuous model which is slow. It is better to use ptm model
    • You use language model while you can use a simple grammar
    • You run a command to recognize a short file, most of the time is taken to read the model. You need to use the server instead with model preloaded. Unimrcp server can process this request in 1/100 of second.
    • You remove words from the dictionary while you should keep it as is, you need to restrict the words in language model/grammar instead.

    Proper command would be:

    pocketsphinx_continuous \
        -samprate 8000 \
        -dict ru.dic \
        -lm my.jsgf \
        -hmm zero_ru.cd_ptm_4000 \
        -infile 1.wav
    

    JSGF should look like this:

    #JSGF V1.0;
    
    grammar result;
    
    public <result> = да | нет | нормально;
    

    Whole time to run the command is

    real    0m0.822s
    user    0m0.789s
    sys 0m0.028s
    

    The actual recognition takes 0.02 seconds

    INFO: fsg_search.c(265): TOTAL fsg 0.02 CPU 0.006 xRT