Search code examples
text-to-speechweb-audio-apiazure-cognitive-servicesaudio-processing

Best way to create text to speech voice variant


I need a minimum of 3/4 different tts voice but unfortunatenly I have only one voice.

This because I have only one Italian neural voice (Diego) and the others are all standard voice and the quality is much worse.

The final objective is create a voice over for 3/4 persons minimum and I can't use the some exact voice.

For this reason, I like to create some variant started by the only one neural voice that I have, that gives the impression of a voice of other people all of this without seem unnatural.

Actually I have Adobe Audition, Audacity , Ircam Trax, ffmpeg and apart this I can use SSML with API (in this case microsoft Azure).

I don't known what are the effects and in what measure use it without damage the voices.

In short I ask what is the best way to do using the software that I have or other if I will get better results.

Thanks !


Solution

  • what language are you using? If you are using English, I am sure you can find more than 3-4 neural voices. There are en-US, en-GB, en-CA, en-AU neural voices and all sound natural.

    You can also tune the pitch using SSML to make the voice sound different.

    If you would like to create different voices, try customvoice.ai with your speech data (or your voice talents).

    or, what are the particular 'variances' you are looking for?