Search code examples
c#google-cloud-speech

Speech to text RecognitionAudio fromBytes always returns blank result


I am trying to use google speech to text in my code. I have live streaming with video and audio of m3U8 format. I am using FFMPEG to extract audio from live url. Trying to send this extracted audio to google api (without saving on disk) to get back transcription. Streaming is done with chunks. API never returns any result and also never throws any error. Can someone tell me why the results are always blank? Note: Using byte[] to send extracted audio to google api. Result: API returns blank result without any error message. using below code to call RecognitionAudio FromBytes.

            outputStream = ffmpeg.StandardOutput.BaseStream;
            byte[] buffer = new byte[16 * 1024];
            using (MemoryStream ms = new MemoryStream())
            {
            int read;
            while ((read = outputStream.Read(buffer, 0, buffer.Length)) > 0)
            {
            ms.Write(buffer, 0, read);
            System.Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", "Demo.json");
            var speech = SpeechClient.Create();
            var longOperation = speech.Recognize(new RecognitionConfig()
            {
            Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
            EnableSeparateRecognitionPerChannel = true,
            SampleRateHertz = 16000,
            LanguageCode = "en",
            }, RecognitionAudio.FromBytes(ms.ToArray()));
            //    longOperation = longOperation.PollUntilCompleted();
            //  var response = longOperation.Results;
            foreach (var result in longOperation.Results)
            {
            foreach (var alternative in result.Alternatives)
            {
            Console.WriteLine(alternative.Transcript);
            }
            }
            }
            }                   

Solution

  • Blank response can indicate incorrect audio encoding. Troubleshooting is found here.