Search code examples
c#azurespeech-to-text

Azure speech to text transcription doesn't run continuously


I originally ran an Azure speech-to-text model that transcribed up to 15 seconds of speech from a file. Now I'm trying to turn it into a model that transcribes longer utterances but the model still cuts out at 15 seconds of speech. The code is:

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

namespace NEST {
    class Program {
        static async Task Main(string[] args) {
            await StartContinuousRecognitionAsync();
        }

        static async Task StartContinuousRecognitionAsync() {
            // Configure the subscription information for the service to access.
            // Use either key1 or key2 from the Speech Service resource you have created
            var config = SpeechConfig.FromSubscription("subscriptionkey", "region");

            // Setup the audio configuration, in this case, using a file that is in local storage.
            using(var audioInput = AudioConfig.FromWavFileInput("C:/Users/MichaelSchwartz/source/repos/AI-102-Process-Speech-master/transcribe_speech_to_text/media/spkr1.wav"))

            // Pass the required parameters to the Speech Service which includes the configuration information
            // and the audio file name that you will use as input
            using(var recognizer = new SpeechRecognizer(config, audioInput)) {
                Console.WriteLine("Recognizing first result...");
                var result = await recognizer.StartContinuousRecognitionAsync();

                switch (result.Reason) {
                case ResultReason.RecognizedSpeech:
                    // The file contained speech that was recognized and the transcription will be output
                    // to the terminal window
                    Console.WriteLine($"We recognized: {result.Text}");
                    break;
                case ResultReason.NoMatch:
                    // No recognizable speech found in the audio file that was supplied.
                    // Out an informative message
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.");
                    break;
                case ResultReason.Canceled:
                    // Operation was cancelled
                    // Output the reason
                    var cancellation = CancellationDetails.FromResult(result);
                    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                    if (cancellation.Reason == CancellationReason.Error) {
                        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                        Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
                        Console.WriteLine($"CANCELED: Did you update the subscription info?");
                    }
                    break;
                }
            }
        }
    }
}

The error returned is:

Cannot assign void to an implicitly-typed variable [NEST]csharp(CS0815).

How do I resolve this and transcribe utterances longer than 15 seconds? Thanks in advance.


Solution

  • Not sure which version of the SDK you're using, but official docs use Delegates rather than result.Reason as it's in your code.

    using var audioConfig = AudioConfig.FromWavFileInput("YourAudioFile.wav");
    using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);
    
    var stopRecognition = new TaskCompletionSource<int>();
    
    recognizer.Recognizing += (s, e) =>
    {
        Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
    };
    
    recognizer.Recognized += (s, e) =>
    {
        if (e.Result.Reason == ResultReason.RecognizedSpeech)
        {
            Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
        }
        else if (e.Result.Reason == ResultReason.NoMatch)
        {
            Console.WriteLine($"NOMATCH: Speech could not be recognized.");
        }
    };
    
    recognizer.Canceled += (s, e) =>
    {
        Console.WriteLine($"CANCELED: Reason={e.Reason}");
    
        if (e.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
        }
    
        stopRecognition.TrySetResult(0);
    };
    
    recognizer.SessionStopped += (s, e) =>
    {
        Console.WriteLine("\n    Session stopped event.");
        stopRecognition.TrySetResult(0);
    };
    
    await recognizer.StartContinuousRecognitionAsync();
    
    // Waits for completion. Use Task.WaitAny to keep the task rooted.
    Task.WaitAny(new[] { stopRecognition.Task });
    

    https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/get-started-speech-to-text?tabs=windowsinstall&pivots=programming-language-csharp