Search code examples
google-text-to-speech

Is it possible to specify Emotions for specific lines of text in Google Text-To-Speech?


I'm trying to create a Children's Animation / Short Story and put it on YouTube. I'm hoping to use Google Translate and Text-To-Speech to generate different language versions.

Since I'll need voices to express different emotions, I was wondering whether there is a way to do the following:

  1. Have different voices for boys/ girls/ grown-ups/ animals etc?
  2. For each line, specify an emotion. e.g: angry/ sad/ excited etc.

Can Google's Text-To-Speech allow for this customization? Thanks.


Solution

  • In Google Text-To-Speech it is not possible to assign emotions to voices. Currently the only options for voices are adult male and female voices in different languages. See available voice list here. There are some voices in the list that are using the WaveNet model which makes the voice sound like a real adult person.

    The customization that Google's Text-To-Speech uses Speech Synthesis Markup Language (SSML) and is currently limited to providing details on pauses, and audio formatting for acronyms, dates, times, abbreviations, or text that should be censored.

    I suggest to explore other Text-to-Speech providers to fit your use case.