python audio speech-recognition speech-to-text

How to match text to audio in Python?

I have an audio file and a text that corresponds to the speech in this audio file.

Is there any way to match the text to the audio so that I get something like timestamps that show where the words in the text file appear in the audio.

Solution

So I have found exactly what I was looking for.

Apparently the technology that matches a given Text to an Audio and returns the exact timestamps is called Forced Alignment.

Here is an extremely useful link to a list of the best forced alignment tools: https://github.com/pettarin/forced-alignment-tools

Personally, I have used Aeneas as it worked really well for me.