Search code examples
unity-game-enginetext-to-speechhololenswindows-mixed-realityssml

MRTK TextToSpeech.SpeakSsml doesn't work when using <voice /> element. Device: HoloLens2


I am using unity + MRTK to develop an application for HoloLens 2. I am trying to use "speech styles" for MRTK TextToSpeech.SpeakSsml method (MRTK API Reference). Text to speech works; however, I am unable to employ speech styles. Example ssml:

<speak version=""1.0"" xmlns=""http://www.w3.org/2001/10/synthesis"" xmlns:mstts=""https://www.w3.org/2001/mstts"" xml:lang=""en-US"">
    <mstts:express-as style=""cheerful"">
      Cheerful hello!
    </mstts:express-as>
    <break time=""1s"" />
    <mstts:express-as style=""angry"">
      Angry goodbye!
    </mstts:express-as>
</speak>

My guess is that the default voice does not support speech styles. But, if I add a voice element to use another voice (there are four available voices listed in the documentation), TextToSpeech won't work at all. So, I am facing two problems:

  1. When using the SpeakSsml method instead of StartSpeaking, the selected voice (TextToSpeech.Voice) is disregarded and I am unable to change it using the voice element.
  2. I couldn't find documentation for supported SSML elements for available voices in MRTK TextToSpeech Class.

Any ideas or useful links?

Thank you!


Solution

  • The TextToSpeech provided by MRTK depends on Windows 10 SpeechSynthesizer class, so it works offline and does not support adjust speaking styles. And the mstts:express-as element is only available in the Azure Speech Service, for more information please refer to this documentation: Improve synthesis with Speech Synthesis Markup Language (SSML)