Search code examples
asynchronousasp.net-corewebsocketgoogle-cloud-platformgoogle-speech-api

Google Speech API streaming audio from websocket


I am trying to get a final speech transcription/recognition result from a Fleck websocket audio stream. The method OnOpen executes code when the websocket connection is first established and the OnBinary method executes code whenever binary data is received from the client. I have tested the websocket by echoing the voice into the websocket and writing the same binary data back into the websocket at the same rate. This test worked so I know that the binary data is being sent correctly (640 byte messages with a 20ms frame size).

Therefore, my code is failing and not the service. My aim is to do the following:

  1. When the websocket connection is created, send the initial audio config request to the API with SingleUtterance == true
  2. Run a background task that listens for the streaming results waiting for isFinal == true
  3. Send each binary message received to the API for transcription
  4. When background task recognises isFinal == true, stop current streaming request and create a new request - repeating steps 1 through 4

The context of this project is transcribing all single utterances in a live phone call.

socket.OnOpen = () =>
            {
                firstMessage = true;
            };
socket.OnBinary = async binary =>
            {
                var speech = SpeechClient.Create();
                var streamingCall = speech.StreamingRecognize();
                if (firstMessage == true)
                {
                    await streamingCall.WriteAsync(
                    new StreamingRecognizeRequest()
                    {
                        StreamingConfig = new StreamingRecognitionConfig()
                        {
                            Config = new RecognitionConfig()
                            {
                                Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
                                SampleRateHertz = 16000,
                                LanguageCode = "en",
                            },
                            SingleUtterance = true,
                        }
                    });
                    Task getUtterance = Task.Run(async () =>
                    {
                        while (await streamingCall.ResponseStream.MoveNext(
                            default(CancellationToken)))
                        {
                            foreach (var result in streamingCall.ResponseStream.Current.Results)
                            {
                                if (result.IsFinal == true)
                                {
                                    Console.WriteLine("This test finally worked");
                                }
                            }
                        }
                    });
                    firstMessage = false;
                }
                else if (firstMessage == false)
                {
                    streamingCall.WriteAsync(new StreamingRecognizeRequest()
                    {
                        AudioContent = Google.Protobuf.ByteString.CopyFrom(binary, 0, 640)
                    }).Wait();
                }
            };

Solution

  • .Wait() is a blocking call being called in an async/await. They don't mix well and can lead to deadlocks.

    Simply keep the code async all the way through

    //...omitted for brevity
    
    else if (firstMessage == false) {
        await streamingCall.WriteAsync(new StreamingRecognizeRequest() {
            AudioContent = Google.Protobuf.ByteString.CopyFrom(binary, 0, 640)
        });
    }