Search code examples
.nettext-to-speechspeech-synthesis

How can I dumb down our cutting-edge Text-to-Speech?


Back in the old days, text-to-speech, as cutting edge as it was, was very imperfect. When you typed in a word, it would pretty much read it how you spelled it... in monotone. Oftentimes, the result would be very funny. Nowadays, Text-to-Speech is too intelligent to goof in ways that can bring a laugh.

As a personal project, I'd like to make up an application that can bring back this old style of text-to-speech, if only as a toy. In .Net, I have available to me both System.Speech.dll and the SpeechLib COM objects. (Microsoft Speech Object Library) Both seem to use the OS's built in Text-to-Speech, which again, is too dang smart. Are there any ways to configure these to disable whatever it is that makes it intelligent?

I've tried a few different 'SayAs' options, I've tried setting the culture to invariant (exception!), and now I'm looking at SSML. It's beginning to look like I'll have to find the old technology itself, but I don't even know where to begin there.

As an example of the chaos I'm hoping to see, here's some Moonbase Alpha for you: http://www.youtube.com/watch?v=Hv6RbEOlqRo (Make sure you are wearing headphones!)

Con flab these newfangled text-to-phoneme converters, and normalizers, and cableless phones, and...


Solution

  • Well, I just managed to stumble across the old "Microsoft Voice Text" library: vtext.dll

    This seems to be what I was looking for! Compared to modern TTS libraries, the interface is very simple. The result doesn't seem to be exactly the same as the voice in that video I linked, but that was probably a different implementation. Either way, it's time to reminisce.

    var tts = new HTTSLib.TextToSpeech();
    tts.Speak("ebrbrbrbrbrbrbrbr");
    

    For some reason it crashes vshost.exe when I make it say "here". But since this is just a dumb personal project, I can ignore it.