i actually use Pocketsphinx for Speech-To-Text an audio file.
I use this command:
pocketsphinx_continuous -logfn /dev/null -infile audio.wav > text.txt
and i want to know if there is a way to get the timestamps of each word. Just like that:
startTime: 0.000s, endTime: 0.200s, word: hello
startTime: 0.250s, endTime: 0.500s, word: world
It's not necessary to me to use Pocketsphinx but i need a free and not limited way for Speech-To-Text an audio file on Linux.
Thanks to @NikolayShmyrev,
The answer is simply to add -time yes
to the command