sphinx-4 aligner skips plain words like `you`, `in` and words with dashes - why?

I'm trying to align simple text. Here are the links to text and audio files:
http://s000.tinyupload.com/?file_id=48044768133759453374
http://s000.tinyupload.com/?file_id=99891199139563396901

Here is the configuration settings:

private static final String ACOUSTIC_MODEL_PATH =
        "resource:/edu/cmu/sphinx/models/en-us/en-us";
private static final String DICTIONARY_PATH =
        "resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict";

The output I get is the following (ellipsis are added by me):

- ï
- ¿in
  a                         [11250:11330]
  standard                  [11330:11920]
  shopping                  [11920:12440]
  centre                    [12440:13020]
- you
  can                       [13380:13730]
  ...
  shops                     [15170:15790]
- you
  can                       [16620:16890]
  buy                       [16890:17140]
  ...
  and                       [26920:27230]
  suits                     [27190:27220]
- thereâ€™s
  a                         [29160:29210]
  sportswear                [29210:29980]
  ...
  clothes                   [33330:33360]
- t-shirts
  shorts                    [35560:36320]
  jumpers                   [36630:37410]
  ...
  for                       [41860:42010]

As you can see for some reason it:

didn't recognize in before the first a
no timing for multiple instances of you
didn't recognize there's, instead it identified it as thereâ€™s
no timing for words with dashes, like t-shirts

Is there any way I can configure sphinx to provide timings for there occurrences?

Solution

Some comments

didn't recognize in before the first a

Your text file has BOM mark which is uknown to aligner. It is better to remove it before alignment

didn't recognize there's, instead it identified it as thereâ€™s

Your text uses UTF-8 apostrophes which are unknown to aligner. You should better convert them to ASCII equivalent

no timing for words with dashes, like t-shirts

Those words are missing in the dictionary. You can add them to the dictionary before alignment or specify g2p model to convert them to phonetics.