Search code examples
text-to-speechspeechsapi

Where is the Sayaka voice in Speech API OneCore?


Windows 10. I've installed the Japanese TTS voices in the Settings. Now, when I use voice enumeration in Speech API 5.4 OneCore (not in 5.4 proper though), I get 6 voices:

  • David
  • Zira
  • Ayumi
  • Haruka
  • Mark
  • Ichiro

The Speech settings page also shows those 6. But there's clearly a seventh one in the registry, Sayaka (HKLM\SOFTWARE\WOW6432Node\Microsoft\Speech_OneCore\Voices\Tokens\MSTTS_V110_jaJP_SayakaM). Its files are present under C:\windows\Speech_OneCore\Engines\TTS\ja-JP. Compared to the rest, there's an extra file, .heq. Why doesn't it enumerate?

The enumeration code goes:

    #import "libid:E6DA930B-BBA5-44DF-AC6F-FE60C1EDDEC8" rename_namespace("SAPI") //v5.4 OneCore

    HRESULT hr;
    SAPI::ISpVoicePtr v;
    v.CreateInstance(__uuidof(SAPI::SpVoice));
    SAPI::ISpObjectTokenPtr tok;
    hr = v->GetVoice(&tok); //Retrieve the default voice
    SAPI::ISpObjectTokenCategoryPtr cat;
    hr = tok->GetCategory(&cat); //Retrieve the voices category
    SAPI::IEnumSpObjectTokensPtr toks;
    hr = cat->EnumTokens(0, 0, &toks);

    //And enumerate
    unsigned long i, n;
    hr = toks->GetCount(&n);
    LPWSTR ws;
    for (i = 0; i < n; i++)
    {
        hr = toks->Item(i, &tok);
        hr = tok->GetId(&ws);
        CoTaskMemFree(ws);
    }

The only other mention of Sayaka online that I could find is here

Edit

Enumerating by Reset()/Next() gives the same 6. Trying to create a token directly around the registry path gives error 0x8004503a (SPERR_NOT_FOUND). Doing so while watching with Process Monitor reveals an interesting fact: rather than Sayaka under HKLM, the process interrogates the following key:

HKCU\Software\Microsoft\Speech_OneCore\Isolated\7WUiMB20NMV5Y7TgZ2WJXbUw32iGZQSvSkeaf0AevtQ\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech_OneCore\Voices\Tokens\MSTTS_V110_jaJP_SayakaM

There's indeed a key like that under HKCU, and it contains a copy of HKLM and HKCU settings for SAPI, and there's indeed no Sayaka under Voices in that key. Just the six I've mentioned.

So there's some kind of isolation going on, with SAPI settings in several copies. There are 7 different subkeys under Isolated, and the voice sets are different under those. Two contain voices that have nothing in common with the ones we know, and those have to do with Cortana. Hard to tell what's the unit of isolation - maybe a user, maybe an app package (in the UWP sense).

Edit

Like I suspected, there's an app package based isolation going on. I've created a brand new project with the same code, ran it, and got a different isolation key - F2yLLxINh6S1e3y3MkJo4ilfh036RB_9pHLEVL88yL0. Looks like every time you run a SAPI enabled application, it derives an isolation profile from the current executable. A moment ago, that isolation profile wasn't there, now it is. So it was created by SAPI on the fly. I don't think the voices are hard-coded, so it copied the voices in the isolation profile from somewhere, from the master list.

Where is the master list? It's not HKLM\...\Speech_OneCore, since one can see Sayaka is there. It could be tokens_TTS_ja-JP.xml under C:\Windows\SysWOW64\Speech_OneCore\Common\ja-JP, since Ayumi/Ichiro/Haruka are listed there but Sayaka isn't. The security on that file is quite draconian though, I'm having trouble editing that file even with admin rights. Also, it's a second hardlink to C:\Windows\WinSxS\wow64_microsoft-windows-t..peech-ja-jp-onecore_31bf3856ad364e35_10.0.18362.1_none_46741f8a666da90a.

The SysWOW64\Speech_OneCore folder allows write for administrators, but SysWOW64\Speech_OneCore\Common doesn't. Only TrustedInstaller can write it.

By the way, the isolation logic is specific to OneCore. SetId() in SAPI 5.4 proper looks in the key that matches the provided Id.


Alternative approach: the SAPI 5.4 docs mention the ISpRegDataKey interface, that lets one initialize a token directly from a HKEY. It's not in the typelib though.


Solution

  • If the isolation registry key doesn't have Sayaka, but HKLM does, an application can copy the Sayaka token to the isolation key on the first run. The key insight here is that the isolation key is writable without elevation, and SAPI supports creating and populating tokens. This doesn't rely on the specifics of isolation. Create a token with a hard-coded ID for Sayaka, and copy the properties and the attributes from HKLM. Like this:

    #import "libid:E6DA930B-BBA5-44DF-AC6F-FE60C1EDDEC8" rename_namespace("SAPI") //v5.4 OneCore
    
    //Get the default voice to avoid hard-coding the category
    SAPI::ISpVoicePtr v;
    SAPI::ISpObjectTokenPtr tok;
    v.CreateInstance(__uuidof(SAPI::SpVoice));
    v->GetVoice(&tok);
    LPWSTR ws;
    tok->GetId(&ws);
    wchar_t TokID[200];
    wcscpy_s(TokID, ws);
    CoTaskMemFree(ws);
    
    //Check if Sayaka is already registered in SAPI
    SAPI::ISpObjectTokenCategoryPtr cat;
    tok->GetCategory(&cat); //The category of voices
    SAPI::IEnumSpObjectTokensPtr toks;
    cat->EnumTokens(L"name=Microsoft Sayaka", 0, &toks);
    unsigned long n;
    toks->GetCount(&n);
    
    if (n == 0) //Sayaka is not registered already
    {
        //Is Sayaka present under HKLM\..\Voices\Tokens?
        HKEY hkSayaka, hkAttrs;
        if (RegOpenKeyEx(HKEY_LOCAL_MACHINE, L"SOFTWARE\\Microsoft\\Speech_OneCore\\Voices\\Tokens\\MSTTS_V110_jaJP_SayakaM", 0, KEY_READ, &hkSayaka) == ERROR_SUCCESS)
        {
            if (RegOpenKeyEx(hkSayaka, L"Attributes", 0, KEY_READ, &hkAttrs) == ERROR_SUCCESS)
            {
                //If yes, create a Sayaka token where SAPI OneCore thinks it should be!
    
                //Replace the final path component of the default voice's ID with Sayaka
                LPWSTR pbs = wcsrchr(TokID, L'\\');
                wcscpy_s(pbs + 1, _countof(TokID) - (pbs - TokID) - 1, L"MSTTS_V110_jaJP_SayakaM");
                tok.CreateInstance(__uuidof(SAPI::SpObjectToken));
                //Note the 1 in the third parameter - "create if needed"
                HRESULT hr = tok->SetId(0, (LPWSTR)TokID, 1);
    
                DWORD dwi;
                wchar_t ValName[100]; //Enough
                unsigned char ValData[1000]; //Enough
                DWORD ValNameLen, ValDataLen, Type;
    
                //Copy all values from the Sayaka key
                //They are all strings
                for (dwi = 0; RegEnumValue(hkSayaka, dwi, ValName, &(ValNameLen = _countof(ValName)), 0, &Type, ValData, &(ValDataLen = sizeof(ValData))) == ERROR_SUCCESS; dwi++)
                    tok->SetStringValue(ValName, (LPWSTR)ValData);
    
                //Copy all attributes from the Sayaka\Attributes key
                //All strings too.
                SAPI::ISpDataKeyPtr attrs;
                tok->CreateKey((LPWSTR)L"Attributes", &attrs);
                for (dwi = 0; RegEnumValue(hkAttrs, dwi, ValName, &(ValNameLen = _countof(ValName)), 0, &Type, ValData, &(ValDataLen = sizeof(ValData))) == ERROR_SUCCESS; dwi++)
                    attrs->SetStringValue(ValName, (LPWSTR)ValData);
    
                RegCloseKey(hkAttrs);
            }
            RegCloseKey(hkSayaka);
        }
    }
    

    A similar approach to exposing the hidden TTS voices is described here: https://www.ghacks.net/2018/08/11/unlock-all-windows-10-tts-voices-system-wide-to-get-more-of-them/


    Since my original problem was limited to one TTS enabled app, I'm going to accept this answer and no the other one. That said, the whole issue with not inviting Sayaka to the party is probably a Microsoft oversight that they should ultimately address. Feel free to upvote my Feedback Hub request. Windows 10 users only.