Search code examples
visual-c++visual-studio-2015text-to-speechsapi

SAPI 5 TTS Events


I'm writing to ask you some advices for a particular problem regarding SAPI engine. I have an application that can speak both to the speakers and to a WAV file. I also need some events to be aware, i.e. word boundary and end input.

    m_cpVoice->SetNotifyWindowMessage(m_hWnd, TTS_MSG, 0, 0);
    hr = m_cpVoice->SetInterest(SPFEI_ALL_EVENTS, SPFEI_ALL_EVENTS);

Just for test I added all events! When the engine speaks to speakers all events are triggered and sent to the m_hWnd window, but when I set output to the WAV file, none of them are sent

    CSpStreamFormat fmt;  
    CComPtr<ISpStreamFormat> pOld;

    m_cpVoice->GetOutputStream(&pOld);
    fmt.AssignFormat(pOld);
    SPBindToFile(file, SPFM_CREATE_ALWAYS, &m_wavStream, &fmt.FormatId(), fmt.WaveFormatExPtr());
    m_cpVoice->SetOutput(m_wavStream, false);
    m_cpVoice->Speak(L"Test", SPF_ASYNC, 0);

Where file is a path passed as argument.

Really this code is taken from the TTS samples found on the SAPI SDK. It seems a little bit obscure the part setting the format...

Can you help me in finding the problem? Or does anyone of you know a better way to write TTS to WAV? I can not use manager code, it should be better to use the C++ version...

Thank you very much for help

EDIT 1

This seems to be a thread problem and searching in the spuihelp.h file, that contains the SPBindToFile helper I found that it uses the CoCreateInstance() function to create the stream. Maybe this is where the ISpVoice object looses its ability to send event in its creation thread.

What do you think about that?


Solution

  • I adopted an on-the-fly solution that I think should be acceptable in most of the cases, In fact when you write speech on files, the major event you would be aware is the "stop" event.

    So... take a look a the class definition:

        #define TTS_WAV_SAVED_MSG            5000
        #define TTS_WAV_ERROR_MSG            5001
    
        class CSpeech { 
        public:
            CSpeech(HWND); // needed for the notifications
            ...
        private:
            HWND m_hWnd;
            CComPtr<ISpVoice> m_cpVoice;
            ...
            std::thread* m_thread;
    
            void WriteToWave();
            void SpeakToWave(LPCWSTR, LPCWSTR);
        };
    

    I implemented the method SpeakToWav as follows

        // Global variables (***)
        LPCWSTR tMsg;
        LPCWSTR tFile;
        long tRate;
        HWND tHwnd;
        ISpObjectToken* pToken;
    
        void CSpeech::SpeakToWave(LPCWSTR file, LPCWSTR msg) {
            // Using, for example wcscpy_s:
            // tMsg <- msg;
            // tFile <- file;
    
            tHwnd = m_hWnd;
            m_cpVoice->GetRate(&tRate);
            m_cpVoice->GetVoice(&pToken);
    
            if(m_thread == NULL)
                m_thread = new std::thread(&CSpeech::WriteToWave, this);
        }
    

    And now... take a look at the WriteToWave() method:

        void CSpeech::WriteToWav() {
            // create a new ISpVoice that exists only in this
            // new thread, so we need to 
            //
            // CoInitialize(...) and...
            // CoCreateInstance(...)
    
            // Now set the voice, i.e. 
            //    rate with global tRate, 
            //    voice token with global pToken
            //    output format and...
            //    bind the stream using tFile as I did in the 
            //      code listed in my question
    
            cpVoice->Speak(tMsg, SPF_PURGEBEFORESPEAK, 0);
            ...
    

    Now, because we did not used the SPF_ASYNC flag the call is blocking, but because we are on a separate thread the main thread can continue. After the Speak() method finished the new thread can continue as follow:

            ...
            if(/* Speak is went ok */)
                ::PostMessage(tHwn, TTS_WAV_SAVED_MSG, 0, 0);
            else
                ::PostMessage(tHwnd, TTS_WAV_ERROR_MSG, 0, 0);
        }
    

    (***) OK! using global variables is not quite cool :) but I was going fast. Maybe using a thread with the std::reference_wrapper to pass parameters would be more elegant!

    Obviously, when receiving the TTS messages you need to clean the thread for a next time call! This can be done using a CSpeech::CleanThread() method like this:

        void CSpeech::CleanThread() {
            m_thread->join(); // I prefer to be sure the thread has finished!
            delete m_thread;
            m_thread = NULL;
        }
    

    What do you think about this solution? Too complex?