Search code examples
autohotkeytext-to-speechsapittml

Generate timed-text synchronised with Text-to-Speech word-by-word?


How can I generate timed-text (e.g. for subtitles) synchronised with Text-to-Speech (TTS) word-by-word?

I'd like to do this using the high quality SAPI5 voices (e.g. those available from IVONA here) and that I have used on Windows 10.

On Windows we already have some good free TTS programs:

  1. Read4Me - open source
  2. Balabolka - closed source
  3. TTSApp Microsoft's own very basic GUI - currently available here - it seems to date from 2001.

TTSApp can produce audio files in WAV. Balabolka creates MP3 files along with synchronised timed-text as LRC files used in Karaoke - BUT only on line-by-line basis NOT word-by-word.
However, both show word-by-word highlighting while they speak aloud on screen - in real time.

If I had some TTS/SAPI5 source code I could simply check the clock every time a new word starts to be generated and write the time and that word to a file. Does anyone know of any project that exposes that level of programming - so I might start from there?

UPDATE SEPT 2016

I've since discovered the TTSApp was reimplemented using AutoHotKey by a certain jballi in 2012.

I've adapted that code to append to a text file the time in ms every time the onWord event handler fires. Still I need to make two passes:

  1. a rapid automated pass to save the WAV file and
  2. a slow (realtime) pass that creates the timing file.

I am still hoping to find a way to accelerate step 2.

BTW The VisualBasic source appears to be archived here.


Solution

  • It is possible to do all of this offline!

    You generate a WAV file using SAPI while specifying DoEvents - documented here.

    A binary representation of each event (e.g. phoneme/word/sentence) gets appended to the end of the WAV file. A certain Hans documented the WAV/SAPI format in 2009 here.

    This can all be done by a simple modification of jballi's 2012 AutoHotkey version of TTSApp

    Basically you replace these lines of code in Example1GUI.ahk

    SpFileStream.Open(SaveToFileName,SSFMCreateForWrite,False)
    
    ;-- Set the output stream to the file stream
    SpVoice.AllowAudioOutputFormatChangesOnNextSet:=False
    SpVoice.AudioOutputStream:=SpFileStream
    
    ;-- Speak using the given flags
    SpVoice.Speak(Text,SpeakFlags)
    

    with the following:

    SpFileStream.Open(SaveToFileName,SSFMCreateForWrite,True) ;-- DoEvents 
    
    ;-- Set the output stream to the file stream
    SpVoice.AllowAudioOutputFormatChangesOnNextSet:=False
    SpVoice.AudioOutputStream:=SpFileStream
    
    if not Sink ;-- DoEvents label
      {
        ComObjConnect(SpVoice, "On")
        Sink:=True
      }
    
    ;-- Speak using the given flags
    SpVoice.Speak(Text,SpeakFlags|SVSFlagsAsync|SVSFPurgeBeforeSpeak)