How can I run Mozilla TTS/Coqui TTS training with CUDA on a Windows system?

I have a machine with a Quadro P5000 graphics card, running Windows 10. I'd like to train a TTS voice on this system. What do I need to install to make this work?

Solution

Here's what to install/do:

Download and install Python 3.8 (not 3.9+) for Windows. During the installation, ensure that you:

Opt to install it for all users.
Opt to add Python to the PATH.

Download and install CUDA Toolkit 10.1 (not 11.0+).
Download "cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.1" (not cuDNN v8+), extract it, and then copy what's inside the cuda folder into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1.
Download the latest 64-bit version of eSpeak NG (no version constraints :-) ).
Download the latest 64-bit version of Git for Windows (no version constraints :-) ).
Open a PowerShell prompt to a folder where you'd like to install Coqui TTS.
Run git clone https://github.com/coqui-ai/TTS.git.
Run cd TTS.
Run python -m venv ..
Run .\Scripts\pip install -e ..
Run the following command (this differs from the command you get from the PyTorch website because of a known issue):

.\Scripts\pip install torch==1.8.0+cu101 torchvision==0.9.0+cu101 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

Put the following into a script called "test_cuda.py" in the TTS folder:

import torch
x = torch.rand(5, 3)
print(x)
print(torch.cuda.is_available())

Run the script via .\Scripts\python ./test_cuda.py and confirm the output looks like this (the first part should have just random numbers, but the last line must read True; if it does not, CUDA is not installed properly):

tensor([[0.2141, 0.7808, 0.9298],
        [0.3107, 0.8569, 0.9562],
        [0.2878, 0.7515, 0.5547],
        [0.5007, 0.6904, 0.4136],
        [0.2443, 0.4158, 0.4245]])
True

Put the following into a script called "train.bat" in the TTS folder, and then customize it for your configuration file:

set PYTHONIOENCODING=UTF-8
set PYTHONLEGACYWINDOWSSTDIO=UTF-8
set PHONEMIZER_ESPEAK_PATH=C:/Program Files/eSpeak NG/espeak-ng.exe

.\Scripts\python.exe ./TTS/bin/train_tacotron.py --config_path "C:/path/to/your/config.json"

Run the script via .\train.bat.

If you are using a different model than Tacotron or need to pass other parameters into the training script, feel free to further customize train.bat.

If you are just getting started with TTS training in general, take a peek at How do I get started training a custom voice model with Mozilla TTS on Ubuntu 20.04?.