Search code examples
azuretext-to-speechazure-cognitive-services

Azure multi-lingual voices seem deprecated; how to find the right ones?


I am trying to use Microsoft Speech SDK with a multilingual neural voice in European Portuguese.

The documentation says that en-US-RyanMultilingualNeural is available in pt-PT, and en-US-JennyMultilingualV2Neural also, for the time being.

This code works for pt-BR, and I set up logging following the documentation:

import azure.cognitiveservices.speech as speechsdk

speech_config = speechsdk.SpeechConfig(subscription="<your-key>", region="<your-region>")
speech_config.set_property(speechsdk.PropertyId.Speech_LogFilename, "/path/to/log/file"))
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
ssml_string = """<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
ce name="en-US-JennyMultilingualNeural">
<lang xml:lang="pt-BR">
    Bom dia! Eu sou um assistente que fala em português.
</lang>
ice>
"""
result = synthesizer.speak_ssml_async(ssml_string).get()
if  result.reason != speechsdk.ResultReason.SynthesizingAudioCompleted:
    print("SSML string is incorrect: {}".format(result))
else:
    print("SSML string is correct")

When I change the language to pt-PT and the voice to en-US-RyanMultilingualNeural or en-US-JennyMultilingualV2Neural, I get this error:

SSML string is incorrect: SpeechSynthesisResult(result_id=..., reason=ResultReason.Canceled, audio_length=0)

The log file has this line, but it provides little information:

[860253]: 1106ms SPX_DBG_TRACE_VERBOSE:  named_properties.h:478 ISpxPropertyBagImpl::SetStringValue: this=0x0x007f8cddb47670; name='CancellationDetails_ReasonDetailedText'; value='Connection was closed by the remote host. Error code: 1007. Error details: Unsupported voice en-US-RyanMultilingualNeural. USP state: TurnStarted. Received audio size: 0 bytes.'

When I change the language to en-US, both voices throw this error. So these voices seem to be deprecated.

Is the documentation out of the date? If so, how can I find the right names for the multilingual voices?


Solution

  • As of Dec 2023, the two voices are in public preview stage and only available in several regions. And they are expected to be GA before next week.