Search code examples
c#winformsdesktop-application

How can I use google text to speech api in windows form?


I want to use google text to speech in my windows form application, it will read a label. I added System.Speech reference. How can it read a label with a button click event? http://translate.google.com/translate_tts?q=testing+google+speech This is the google text to speech api, or how can I use microsoft's native text to speech?


Solution

  • UPDATE Google's TTS API is no longer publically available. The notes at the bottom about Microsoft's TTS are still relevant and provide equivalent functionality.


    You can use Google's TTS API from your WinForm application by playing the response using a variation of this question's answer (it took me a while but I have a real solution):

    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
            this.FormClosing += (sender, e) =>
                {
                    if (waiting)
                        stop.Set();
                };
        }
    
        private void ButtonClick(object sender, EventArgs e)
        {
            var clicked = sender as Button;
            var relatedLabel = this.Controls.Find(clicked.Tag.ToString(), true).FirstOrDefault() as Label;
    
            if (relatedLabel == null)
                return;
    
            var playThread = new Thread(() => PlayMp3FromUrl("http://translate.google.com/translate_tts?q=" + HttpUtility.UrlEncode(relatedLabel.Text)));
            playThread.IsBackground = true;
            playThread.Start();
        }
    
        bool waiting = false;
        AutoResetEvent stop = new AutoResetEvent(false);
        public void PlayMp3FromUrl(string url)
        {
            using (Stream ms = new MemoryStream())
            {
                using (Stream stream = WebRequest.Create(url)
                    .GetResponse().GetResponseStream())
                {
                    byte[] buffer = new byte[32768];
                    int read;
                    while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
                    {
                        ms.Write(buffer, 0, read);
                    }
                }
    
                ms.Position = 0;
                using (WaveStream blockAlignedStream =
                    new BlockAlignReductionStream(
                        WaveFormatConversionStream.CreatePcmStream(
                            new Mp3FileReader(ms))))
                {
                    using (WaveOut waveOut = new WaveOut(WaveCallbackInfo.FunctionCallback()))
                    {
                        waveOut.Init(blockAlignedStream);
                        waveOut.PlaybackStopped += (sender, e) =>
                        {
                            waveOut.Stop();
                        };
    
                        waveOut.Play();
                        waiting = true;
                        stop.WaitOne(10000);
                        waiting = false;
                    }
                }
            }
        }
    }
    

    NOTE: The above code requires NAudio to work (free/open source) and using statements for System.Web, System.Threading, and NAudio.Wave.

    My Form1 has 2 controls on it:

    1. A Label named label1
    2. A Button named button1 with a Tag of label1 (used to bind the button to its label)

    The above code can be simplified slightly if a you have different events for each button/label combination using something like (untested):

        private void ButtonClick(object sender, EventArgs e)
        {
            var clicked = sender as Button;
    
            var playThread = new Thread(() => PlayMp3FromUrl("http://translate.google.com/translate_tts?q=" + HttpUtility.UrlEncode(label1.Text)));
            playThread.IsBackground = true;
            playThread.Start();
        }
    

    There are problems with this solution though (this list is probably not complete; I'm sure comments and real world usage will find others):

    1. Notice the stop.WaitOne(10000); in the first code snippet. The 10000 represents a maximum of 10 seconds of audio to be played so it will need to be tweaked if your label takes longer than that to read. This is necessary because the current version of NAudio (v1.5.4.0) seems to have a problem determining when the stream is done playing. It may be fixed in a later version or perhaps there is a workaround that I didn't take the time to find. One temporary workaround is to use a ParameterizedThreadStart that would take the timeout as a parameter to the thread. This would allow variable timeouts but would not technically fix the problem.
    2. More importantly, the Google TTS API is unofficial (meaning not to be consumed by non-Google applications) it is subject to change without notification at any time. If you need something that will work in a commercial environment I'd suggest either the MS TTS solution (as your question suggests) or one of the many commercial alternatives. None of which tend to be even this simple though.

    To answer the other side of your question:

    The System.Speech.Synthesis.SpeechSynthesizer class is much easier to use and you can count on it being available reliably (where with the Google API, it could be gone tomorrow).

    It is really as easy as including a reference to the System.Speech reference and:

    public void SaySomething(string somethingToSay)
    {
        var synth = new System.Speech.Synthesis.SpeechSynthesizer();
    
        synth.SpeakAsync(somethingToSay);
    }
    

    This just works.

    Trying to use the Google TTS API was a fun experiment but I'd be hard pressed to suggest it for production use, and if you don't want to pay for a commercial alternative, Microsoft's solution is about as good as it gets.