machine-learning deep-learning neural-network voice

Adapt Tortoise-TTS for another language

I watched a YouTube video about voice cloning: https://www.youtube.com/watch?v=Kfr_FZof_hs It's an interesting topic, but this project's repository only supports English: https://colab.research.google.com/drive/1NxiY3zHN4Nd8J3YAqFsbYaOB71IiLE04?usp=sharing#scrollTo=JrK20I32grP6

I want to adapt it for Italian.

I am a beginner in machine learning. What do I need to do to get TTC to "learn" Italian? Is it necessary to train the model on audio files or rebuild the model, or what needs to be done? Can you advise me)

Solution

You can check the following where the creator answers this question.

Creator's answer:

Here is what you need to train this:

a wav2vec or similar asr model for your language
at least 10,000 hours of usable spoken language, with no environmental noises, music, etc. This does not need to be transcribed. I used audiobooks and podcasts for english.
approximately 16 months total of v100 time

https://github.com/neonbjb/tortoise-tts/issues/5