Search code examples
speech-recognitionsphinx4

sphinx-4 aligner skips plain words like `you`, `in` and words with dashes - why?


I'm trying to align simple text. Here are the links to text and audio files:
http://s000.tinyupload.com/?file_id=48044768133759453374
http://s000.tinyupload.com/?file_id=99891199139563396901

Here is the configuration settings:

private static final String ACOUSTIC_MODEL_PATH =
        "resource:/edu/cmu/sphinx/models/en-us/en-us";
private static final String DICTIONARY_PATH =
        "resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict";

The output I get is the following (ellipsis are added by me):

- ï
- ¿in
  a                         [11250:11330]
  standard                  [11330:11920]
  shopping                  [11920:12440]
  centre                    [12440:13020]
- you
  can                       [13380:13730]
  ...
  shops                     [15170:15790]
- you
  can                       [16620:16890]
  buy                       [16890:17140]
  ...
  and                       [26920:27230]
  suits                     [27190:27220]
- there’s
  a                         [29160:29210]
  sportswear                [29210:29980]
  ...
  clothes                   [33330:33360]
- t-shirts
  shorts                    [35560:36320]
  jumpers                   [36630:37410]
  ...
  for                       [41860:42010]

As you can see for some reason it:

  • didn't recognize in before the first a
  • no timing for multiple instances of you
  • didn't recognize there's, instead it identified it as there’s
  • no timing for words with dashes, like t-shirts

Is there any way I can configure sphinx to provide timings for there occurrences?


Solution

  • Some comments

    didn't recognize in before the first a

    Your text file has BOM mark which is uknown to aligner. It is better to remove it before alignment

    didn't recognize there's, instead it identified it as there’s

    Your text uses UTF-8 apostrophes which are unknown to aligner. You should better convert them to ASCII equivalent

    no timing for words with dashes, like t-shirts

    Those words are missing in the dictionary. You can add them to the dictionary before alignment or specify g2p model to convert them to phonetics.