Search code examples
javascriptc#mediarecordergoogle-speech-api

Google Cloud Speech API on HTML page


I have implemented Google Cloud Speech API in a c# console API. Now I want to implement the same on a HTML page. Below are the steps I have followed:

  1. Captured the voice on HTML page using Media recorder and post the same to a WEB API:
  mediaRecorder.ondataavailable = function (e) {
                    chunks.push(e.data);
                    var blob = new Blob(chunks, { 'type': 'audio/wav; codecs=0' });
                    var fd = new FormData();
                    fd.append('fname', 'test.wav');
                    //fd.append('data', chunks[0]);
                    fd.append('data', blob);
                    $.ajax({
                        type: 'POST',
                        url: APIUrl,
                        data: fd,
                        processData: false,
                        contentType: false
                    }).done(function (data) {
                        console.log(data);
                    });
  1. On the WEB API I am using Google Cloud speech recognition. But to my luck, It returns null response. The test file provided by google Audio.raw is working fine with the same code. But any audio sent from webpage is not providing any results.
            string text = "";
            var speech = SpeechClient.Create();


            var response = speech.Recognize(new RecognitionConfig()
            {
                Encoding = RecognitionConfig.Types.AudioEncoding.OggOpus,
                SampleRateHertz = 48000,
                LanguageCode = "en",

            }, RecognitionAudio.FromStream(HttpContext.Current.Request.Files[0].InputStream));

            foreach (var result in response.Results)
            {
                foreach (var alternative in result.Alternatives)
                {
                    text = alternative.Transcript;
                }
            }

I have tried different combinations of Encoding and Hertz. But none works. Also I tried saving the audio first on local drive in WAV format and reading the response from local file. But it does not work either.


Solution

  • You are not recording in the format you think you are recording.

    • MediaRecorder in Chrome only supports codec opus in WebM container.
    • MediaRecorder in Firefox however supports codec opus in Ogg container.

    This can quickly validated by running the following snippet in respective browser's JS console. You will see True or False based on the support.

    MediaRecorder.isTypeSupported('audio/webm;codecs=opus')
    MediaRecorder.isTypeSupported('audio/ogg;codecs=opus')
    

    Google Cloud Speech API supports Opus but only in Ogg container. If you run the same code in Firefox, the output with Speech API should work.

    For this to work with Chrome you will need to re-mux the file in Ogg container on the server side before sending it to the Cloud Speech API.

    You can use ffmpeg to do so

    ffmpeg -i file_chrome.wav -acodec copy resources/file.oga

    Note that this is a re-mux and not a re-encode process. You are just copying the same data in a different container.

    Bonus Tip: If you are on Linux/Mac you can use the file <file_name> command to check the output file type. Chrome file would show up as WebM and Firefox output would show up as Ogg data, Opus audio.