Search code examples
speech-recognitionazure-cognitive-servicesvoice-recognition

Azure Cognitive Speech Services STT - Partial Text


In my code (below) when I process it through STT it only gives me the first alphabet/word of the entire audio.

The audio has "A B C D E F"

What am I missing?

Imports Microsoft.CognitiveServices.Speech
Imports Microsoft.CognitiveServices.Speech.SpeechConfig
Imports Microsoft.CognitiveServices.Speech.Audio

Module Module1

    Sub Main()
        Dim SpeechConfig As SpeechConfig = FromSubscription("<CHANGED>", "eastus")
        Dim audioConfig As Audio.AudioConfig = Audio.AudioConfig.FromWavFileInput("<CHANGED>.wav")
        SpeechConfig.OutputFormat = Microsoft.CognitiveServices.Speech.OutputFormat.Detailed
        Dim recognizer As New SpeechRecognizer(SpeechConfig, audioConfig)
        Dim result = recognizer.RecognizeOnceAsync().Result

        Select Case result.Reason
            Case ResultReason.RecognizedSpeech
                Console.WriteLine($"RECOGNIZED: Text={result.Text}")
                Console.WriteLine($"    Intent not recognized.")
            Case ResultReason.NoMatch
                Console.WriteLine($"NOMATCH: Speech could not be recognized.")
            Case ResultReason.Canceled
                Dim cancellation = CancellationDetails.FromResult(result)
                Console.WriteLine($"CANCELED: Reason={cancellation.Reason}")

                If cancellation.Reason = CancellationReason.[Error] Then
                    Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}")
                    Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}")
                    Console.WriteLine($"CANCELED: Did you update the subscription info?")
                End If
        End Select

    End Sub

End Module

You can download the audio file on github here https://github.com/ullfindsmit/StackOverflowAssets/blob/master/abcdef.wav

Also, if you know where I could get a more detailed STT data i'd appreciate it. What I am looking for is like a JSON output that says start time and end time along with the word and/or sentence.

Your help is much appreciated.

UPDATE So The async handlers did not work for me for some reason However, the code below did

        While True
            Dim result = recognizer.RecognizeOnceAsync().Result
            Select Case result.Reason
                Case ResultReason.RecognizedSpeech
                    Console.WriteLine($"RECOGNIZED: Text={result.Text}")
                    Console.WriteLine($"    Intent not recognized.")
                Case ResultReason.NoMatch
                    Console.WriteLine($"NOMATCH: Speech could not be recognized.")
                Case ResultReason.Canceled
                    Dim cancellation = CancellationDetails.FromResult(result)
                    Console.WriteLine($"CANCELED: Reason={cancellation.Reason}")

                    If cancellation.Reason = CancellationReason.[Error] Then
                        Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}")
                        Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}")
                        Console.WriteLine($"CANCELED: Did you update the subscription info?")
                    End If

                    Exit While
            End Select
        End While

Solution

  • The RecognizeOnceAsync method will only recognize "once" ... the first "utterance/phrase" contained in the audio data file. If you'd like to recognize more than one phrase, you can do one of these two things:

    1. Call RecognizeOnceAsync repeatedly... After the last phrase is recognized, the next call to the method will return a result that has result.Reason set to Canceled.

    2. Switch from using RecognizeOnceAsync to using StartContinuousRecognitionAsync and hook an event hanlder up to the Recognizing event. The event callback will allow you to see the results by inspecting the SpeechRecognitionEventArgs passed, like this: e.Result ...

    You can see both of these behaviors by running the Speech CLI like this:

    spx recognize --once+ --key YOUR-KEY --region YOUR-REGION --file "https://github.com/ullfindsmit/StackOverflowAssets/blob/master/abcdef.wav"
    spx recognize --continuous --key YOUR-KEY --region YOUR-REGION --file "https://github.com/ullfindsmit/StackOverflowAssets/blob/master/abcdef.wav"
    

    You can download the Speech CLI here: https://aka.ms/speech/spx-zips.zip