Search code examples
speech-recognitionsphinx4

Decoding phone calls in ALAW format with sphinx4


We are planning to develop a system to convert speech obtained from phone calls to text using sphinx4. The format of such phone calls is
Type: Audio
Codec: PCM ALAW
Channels: Mono
Sample Rate: 8kHz BitRate: 8 bits per second

In the tutorial it says

If you are using sound files with a sampling rate of 8KHz (telephone audio), you need to change some values in etc/sphinx_train.cfg

Are there any other changes to be done apart from this?
Is it possible to develop a system for bit rate of 8 bits/sec because in the tutorial it says

“It's critical to have audio files in a specific format. Sphinxtrain does support some variety of sample rates but by default it's configured to train from16khz 16bit mono files in MS WAV format.”


Solution

  • In the tutorial it says

    This tutorial is irrelevant for you. The correct tutorial you need to follow is http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4

    Are there any other changes to be done apart from this?

    You do not need such changes at all

    Is it possible to develop a system for bit rate of 8 bits/sec because in the tutorial it says

    You need to convert from ALAW format to 8khz 16bit PCM format. This conversion must be done with external tools like sox or other libraries. Then you decode 8khz 16bit PCM audio.

    In sphinx4 to properly decode 8khz audio use

      configuration.setSampleRate(8000)