Search code examples
azuretext-to-speechazure-cognitive-services

Azure neural TTS; lexicon (blob storage) ignored


Using Azure neural voices TTS via python speech services module, I am trying to get a custom lexicon to be used. Yes, I've spent hours reading and trying things already.

I've read that the lexicon file must be stored in Azure blob storage or Github. I've created blob storage, and ensured it is anonymously readable. I get audio output, but the phrase "BTW" in the SSML is pronounced as "By the way" which is the default alias built-in, and not the one I provided in my lexicon.

publicly readable lexicon file

<?xml version="1.0" encoding="utf-8"?>
<lexicon xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd" version="1.0" alphabet="ipa" xml:lang="en-US" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon">
  <lexeme>
    <grapheme>BTW</grapheme>
    <alias>By the flippin' way</alias>
  </lexeme>
</lexicon>

SSML

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" version="1.0" xml:lang="en-US">
<voice name="en-US-EmmaNeural">
<lexicon uri="https://<mynamespace>.blob.core.windows.net/ttsfiles/lexicon3.xml"/>
The phrase is: BTW
</voice></speak>
  • namespace redacted
  • the postfix number I increment to get around the 15-minute caching rule

Solution

  • Reading the docs more closely, lexicons are not supported for the specific neural voices I was using. It's helpful to use the Speech Studio to debug.