Identify start/stop times of spoken words within a phrase using Sphinx

I'm trying to identify the start/end time of individual words within a phrase. I have a WAV file of the phrase AND the text of the utterance.

Is there an intelligent way of combining these two data (audio, text) to improve Sphinx's recognition abilities? What I'd like as output are accurate start/stop times for each word within the phrase.

(I know you can pass -time yes to pocketsphinx to get the time data I'm looking for -- however, the speech recognition itself is not very accurate.)

The solution cannot be for a specific speaker, as the corpus I'm working with contains a lot of different speakers, although they are all using US English.

Solution

We have a specific tool for that - audio aligner in sphinx4. You can check

http://cmusphinx.sourceforge.net/2014/07/long-audio-aligner-landed-in-trunk/

Why does an empty preprocessor command still evaluate to something?
How to implement variable sized array within C struct
Character array typecasting to integer
How can I exclude non-numeric keys? CS50 Caesar Pset2
How to get the sign, mantissa and exponent of a floating point number
Why do MCU libraries use logic operations instead of bitfield structs?
What kind of implementation can I use for a static associative array on a vintage system with very limited resources?
Determine libraries to link against for a windows library function?
Passing macro values to arm linker that places variable at a specific location
running a program with wildcards as arguments
How to perform addition of two vectors of 8-bit integers with a single addition in C/C++
GNU RISC-V Embedded GCC throws "x ISA extension `xw' must be set with the versions" error
Counting pulses using a swiss flow meter with an Arduino, how is it done?
How to create a folder in C (need to run on both Linux and Windows)
Is there any way to compute the width of an integer type at compile-time?
How can I initialize all members of an array to the same value?
Is C notably faster than C++
How to get the Windows SDK version number a program is compiling with at compile time
Confused by difference between expression inside if and expression outside if
Equivalent of atoi for unsigned integers
k&r: Exercise 1-18. Program takes input but doesnt produce any output?
Using in C thrd_sleep() to either wait for time or interrupt by signal. Example?
How can I compute `exp(x)/2` when `x` is large?
c programming: answer always equates to 0
Is it possible to access a parameter of a function from another function in C?
Will this expression evaluate to true or false (1 or 0) in C?
What Is the Return Value of strcspn() When Str1 Does not Contain Str2?
Mapping a numeric range onto another
Signalled and non-signalled state of event
Why is faster to do a branch than a lookup?