Search code examples
text-to-speechsapi

How do I stop SAPI.SpVoice reading "is." as "island"?


Can you configure the way SAPI.spVoice reads text?

In my situation I am reading the current clipboard using an AutoHotKey script. The script makes a COM call to SAPI.spVoice passing the text from the clipboard.

;;;;;;;;;;;;;;;;;;;;TTS;;;;;;;;;;;;;;;;;;;;;;
#^!D:: ; Win + Ctrl + D + Alt 
ClipSaved := ClipboardAll   
clipboard = ; Start off empty to allow ClipWait to detect when the text has arrived.
Send ^c
ClipWait  ; Wait for the clipboard to contain text.
ComObjCreate("SAPI.SpVoice").Speak(clipboard)
Clipboard := ClipSaved 
ClipSaved = ; Free the memory 
return 

The problem is.. that SAPI reads some text incorrectly..

For Example:

  • "Yes it is. Ours is complex." reads "is." as island,
  • "Yes it is. This is complex." is read correctly.

You can experiment with this by doing the following:

If you are running windows 7.

  • Press the windows key and type "Change text to speech settings" and pick the option.
  • In this dialog enter "Yes it is. Ours is complex." in the "Use the following text to preview the voice:" field.
  • Press "Preview Voice"
  • Hear it read the "is." as island.

So... My question is...

Is it possible to change/configure the way "Microsoft Anna" reads text so it doesn't make these mistakes?

Is this a bug in the Anna voice only or all voices?

How can I make it read the text the way I want it read?


Solution

  • "Every problem (except the problem of too many levels of indirection) can be solved with another level of indirection."

    The SAPI.spVoice object can be passed text (as I was doing) or SSML.

    By taking the text to be spoken, then converting it to SSML you gain control over how words are spoken. You have a chance to pre-process the text and replace miss-read words with the specific pronunciation you wish.

    For example: "Yes it is. Ours is complex." becomes "Yes it <sub alias="is">is</sub>. Ours is complex."

    sub and say_as seem to work. phoneme seem to be ignored.. but I may have something configured wrongly.

    Note: If you want XML read aloud, XML escape the text before converting it to SSML, otherwise it will be assumed to be part of the SSML.

    So.. in code:

    ;;;;;;;;;;;;;;;;;;;;TTS;;;;;;;;;;;;;;;;;;;;;;
    #^D:: ; Win + Ctrl + D 
    ClipSaved := ClipboardAll   
    Clipboard = ; Start off empty to allow ClipWait to detect when the text has arrived.
    Send ^c
    ClipWait  ; Wait for the clipboard to contain text.
    FileDelete , c:\tmp\tmp_ahk_tts_clip.txt
    FileAppend , %Clipboard% , c:\tmp\tmp_ahk_tts_clip.txt
    RunWait, %comspec% /c ""F:\bin\tools\speakit.rb" c:\tmp\tmp_ahk_tts_clip.txt > c:\tmp\tmp_ahk_clip_tts_out.txt" ,,Hide
    FileRead, Clipboard, c:\tmp\tmp_ahk_clip_tts_out.txt
    ComObjCreate("SAPI.SpVoice").Speak(Clipboard)
    Clipboard := ClipSaved 
    ClipSaved = ; Free the memory 
    return 
    

    and F:\bin\tools\speakit.rb is sometihng like this:

    #!/usr/bin/env ruby
    substitutions = [
    [/[A-Z][A-Z][A-Z][A-Z]+((?=[^A-Za-z])|(?!.))/, lambda{|x|x.downcase}], #All caps becomes word
    [/\.exe(?=[^a-z])/i, " executable "],
    [/\.txt(?=[^a-z])/i, " text file "],
    [/rebranded/, "re-branded"],
    [/App(?=[\s\.])/, " application "],
    ['GUI' , " gooee "],
    [/localhost/, "local host"],
    [/(?<word>[A-Z][a-z]*)(?=[A-Z ,\.;:\t\/])/, "'\\k<word>' "], # CamelCaseWords should be split by spaces
    ['\\', '<sub alias="slash">\\</sub>'],
    ]
    
    
    require 'cgi'
    
    puts <<-eos
    <?xml version="1.0"?>
    <speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-UK">
    <voice xml:lang="en-UK">
       #{substitutions.reduce(CGI::escapeHTML(ARGF.read)){|o, (r,s)| s.is_a?(Proc) ? o.gsub(r, &s) : o.gsub(r,s) }}
    </voice>
    </speak>
    eos