Search code examples
javascripthtmlcanvastext-to-speech

Text to speech with avatar lip sync, no plug-ins


Is there a JavaScript library or product that exists that provides text-to-speech for animated, speaking avatars, that does not use flash or any other plug-in. The idea is that I type in text and the avatars mouth moves as audio is played.

The aim is a cross-browser, cross device, no-plugins, web-based talking chat avatar.

I looked at CrazyTalk, which seemed perfect, but sadly it turns out that that relies on the unity engine.

I then started to think about rolling my own by combining existing text to speech services and trying to pull phonemes out of an audio wave and make my own dictionary of phonemes to canvas shapes. That doesn't really seem to exist either (and even if it did, I'm not sure how I would work the timing on mouth movement to audio).

Its 2015, I feel like something like this should already exist and I shouldn't be trying to invent it.

Edit: Now I'm looking into Microsft.Speech. I really need something that spits out something like IPA in syllables and I'm not sure if MS.Speech does that. TTS wave creation is the easy part. I could send text to the server, match phonetic syllables to mouth point coordinates... if I could just get those syllables broken out. What breaks text into phonetic syllables.


Solution

  • I think I have an approach. In short, no, there does not appear to be an existing utility... Yet ;-)

    I've decide to go with the Microsoft Speech Platform. It does better than return phonemes, it provides the accompanying viseme IDs with the audio position at which they occur. So I can generate a wav file and a viseme meta-data list server-side and retrieve them. Now to figure out how to synchronize them.