Search code examples
speech-recognitioncmusphinxpocketsphinx

Pocketsphinx - What is the meaning of debug output (cmn_prior, fsg_search) of recognition process?


EDIT: since it seemed unclear, I'll make the question more specific.

What does this numerical output in pocketsphinx mean?

< INFO: cmn_prior.c(149): cmn_prior_update: to   < 55.55 10.06 -1.22 10.50 -3.09  1.89 -8.37 -9.24 -5.98 -4.85  4.65 -3.25 -3.95 >
< INFO: fsg_search.c(859): 191 frames, 4969 HMMs (26/fr), 12795 senones (66/fr), 1090 history entries (5/fr)

I'm comparing diffent runs with almost identical wav files and obtaining slightly different numbers, and I'm interested on what that output means, and what type of conclusion or information can be obtained from it.

I couldn't find any documentation about it. What I am interested in, is understanding pocketsphinx debug output better. I have notions about the internals of the models and the theory behind the process, but I don't know how to interpret this output. Thanks!


Solution

  • < INFO: cmn_prior.c(149): cmn_prior_update: to < 55.55 10.06 -1.22 10.50 -3.09 1.89 -8.37 -9.24 -5.98 -4.85 4.65 -3.25 -3.95 >

    This line tells that cepstral mean is updated to those specific values. You can read about cepstral mean normalization here.

    < INFO: fsg_search.c(859): 191 frames, 4969 HMMs (26/fr), 12795 senones (66/fr), 1090 history entries (5/fr)

    This information is related to Viterbi search. It said that your audio had 191 frames. During search 4969 HMMs were active, 12795 acoustic senones were evaluated. Viterbi search history contained 5 entries per frame in average.

    I'm comparing diffent runs with almost identical wav files and obtaining slightly different numbers

    If inputs are slightly different it is perfectly fine to see slightly different values too.

    and what type of conclusion or information can be obtained from it.

    You can not draw anything specific from the information you provided. If numbers were unusual you could look for the reason. For example if CMN(0) was -200 outside of of usual range of 10-60. Or if there were 0 frames. The values you provided are expected.