Many YouTube videos have automatic captions for lyrics. We believe that they are using the Google Speech Recognition API. However, when we use the Google Speech Recognition API (or any speech recognition API), we do not get accurate lyrics. Sometimes, we only get one line from the song. Why may this be?
Does anyone have suggestions for acquiring the real-time lyrics from a song? Or an API/library for training audio?
Thank you for your help!
In case anyone else is wondering, the youtube-transcript-api Python API can be used to get the transcripts/subtitles for a given YouTube video.