Search code examples
pythonaudiospeech-recognitionspeech-to-text

How to match text to audio in Python?


I have an audio file and a text that corresponds to the speech in this audio file.

Is there any way to match the text to the audio so that I get something like timestamps that show where the words in the text file appear in the audio.


Solution

  • So I have found exactly what I was looking for.

    Apparently the technology that matches a given Text to an Audio and returns the exact timestamps is called Forced Alignment.

    Here is an extremely useful link to a list of the best forced alignment tools: https://github.com/pettarin/forced-alignment-tools

    Personally, I have used Aeneas as it worked really well for me.