Search code examples
pythonspeech-recognitionspeech-to-textfairseq

Train Wav2Vec-U for a custom dataset


I found the github repo of Wav2Vec-U, but it is not well-documented. I wonder if there is any train procedure for it. I'm trying to train it on common voice audios. But it needs wrd, ltr, and phn files, which I don't have access to.


Solution

  • Currently the best (only?) relevant writeup is this notebook hosted on Kaggle. In the comments section of that notebook, there is a link to another notebook which is specifically relevant to the wrd, ltr, and phn files part of your question.