Search code examples
androidmauispeech-recognitiontext-to-speech

MAUI Speech Recognition - Unable to Store Final Result of a Processed Speech When SpeechInputCompleteSilence Intent Is Enabled


I'm using Gerald Versluis method to enable Speech Recognition and Speech-To-Text on Android using MAUI - https://github.com/jfversluis/MauiSpeechToTextSample

I'm trying to make the Recognition Text permanent by moving it to another label and so it doesn't get overwritten by new incoming speech, but I can't seem to access the value that holds the final processed speech.

In the RecordButton method, the value that contains the final recognized speech is RecognitionText. From MainPage.xaml.cs

    private async void RecordButton()
    {
        var isAuthorized = await speechToText.RequestPermissions();

        if (isAuthorized)
        {
            try
            {
                RecognitionText = await speechToText.Listen(
                    System.Globalization.CultureInfo.GetCultureInfo("en-us"),
                    new Progress<string>(partialText =>
                    {
                        RecognitionText = partialText;
                        OnPropertyChanged(nameof(RecognitionText));
                    }),
                    tokenSource.Token);

// When ExtraSpeechInputCompleteSilenceLengthMillis is used, this code is never reached
// When ExtraSpeechInputCompleteSilenceLengthMillis is commented out, this code is reached and everything works great
                HistoryText += RecognitionText + Environment.NewLine + Environment.NewLine;
                OnPropertyChanged(nameof(HistoryText));
            }
            catch (Exception ex)
            {
                await DisplayAlert("Error", ex.ToString(), "OK");
            }
        }
        else
        {
            await DisplayAlert("Permission Error", "No microphone access", "OK");
        }
    }

Here's the XAML part from MainPage.xaml

    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height="3*" />
            <RowDefinition Height="*" />
            <RowDefinition Height="*" />
        </Grid.RowDefinitions>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="*" />
        </Grid.ColumnDefinitions>

        <Border>
            <ScrollView Margin="10">
                <Label
                    Grid.Row="0"
                    x:Name="HistoryLabel"
                    Text="{Binding HistoryText}"
                    FontSize="{OnIdiom Phone=Medium, Tablet=Large}" />
            </ScrollView>
        </Border>
        <Label
            Grid.Row="1"
            x:Name="RecognitionLabel"
            Text="{Binding RecognitionText}"
            FontSize="{OnIdiom Phone=Medium, Tablet=Large}"
            Margin="5"
            Padding="5" />
        <Button
            Grid.Row="2"
            CornerRadius="60"
            WidthRequest="120"
            HeightRequest="120"
            Text="Record"
            FontSize="24"
            FontAttributes="Bold"
            Command="{Binding RecordButtonCommand}" />

    </Grid>

This is the Listen method definition in SpeechToTextImplementation.cs

        public async Task<string> Listen(CultureInfo culture, IProgress<string> recognitionResult, CancellationToken cancellationToken)
        {
            var taskResult = new TaskCompletionSource<string>();

            listener = new SpeechRecognitionListener
            {
                Error = ex => taskResult.TrySetException(new Exception("Failure in speech engine - " + ex)),
                PartialResults = sentence =>
                {
                    recognitionResult?.Report(sentence);
                },
                Results = sentence => taskResult.TrySetResult(sentence)
            };

            speechRecognizer = SpeechRecognizer.CreateSpeechRecognizer(Android.App.Application.Context);

            if (speechRecognizer is null)
                throw new ArgumentException("Speech recognizer is not available.");

            speechRecognizer.SetRecognitionListener(listener);
            speechRecognizer.StartListening(CreateSpeechIntent(culture));

            await using (cancellationToken.Register(() =>
            {
                StopRecording();
                taskResult.TrySetCanceled();
            }))
            {
                return await taskResult.Task;
            }
        }

And finally the CreateSpeechIntent method with the problem intent

        private static Intent CreateSpeechIntent(CultureInfo culture)
        {
            var intent = new Intent(RecognizerIntent.ActionRecognizeSpeech);
            intent.PutExtra(RecognizerIntent.ExtraLanguagePreference, Java.Util.Locale.Default);

            var javaLocale = Java.Util.Locale.ForLanguageTag(culture.Name);
            intent.PutExtra(RecognizerIntent.ExtraLanguage, javaLocale);
            intent.PutExtra(RecognizerIntent.ExtraLanguageModel, RecognizerIntent.LanguageModelFreeForm);
            intent.PutExtra(RecognizerIntent.ExtraCallingPackage, Android.App.Application.Context.PackageName);

//-->Problem//intent.PutExtra(RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, 10000);

            //intent.PutExtra(RecognizerIntent.ExtraBiasingStrings, ...) // SCA Use ATC strings!!
            //intent.PutExtra(RecognizerIntent.ExtraMaxResults, 1);
            //intent.PutExtra(RecognizerIntent.ExtraSpeechInputPossiblyCompleteSilenceLengthMillis, 1500);
            //intent.PutExtra(RecognizerIntent.ExtraSpeechInputMinimumLengthMillis, 15000); // SCA Are you done talking? Yes or no. Set right ere.

            intent.PutExtra(RecognizerIntent.ExtraPartialResults, true);

            return intent;
        }
  • Commenting out intent.PutExtra(RecognizerIntent.ExtraSpeechInputCompleteSilenceLengthMillis, 10000); fixes the problem. Unfortunately, I need to leave the mic open for 10 seconds.
  • I used different values for ExtraSpeechInputCompleteSilenceLengthMillis, but any value I use causes the problem. This is for debugging only, since I need the full 10 secs.
  • I tried to access RecognitionText from SpeecToTextImplementation.cs, but no matter what I tried the UI would not update with the value sent. SpeecToTextImplementation.cs is in the Android folder.

Solution

  • This problem was easily resolved by simply using the MAUI Community Toolkit. The MCT has all the needed methods to handle speech processing. There is no longer a need to do it manually. Just use the toolkit's RecognitionResultUpdated and RecognitionResultCompleted to get the partial and final result.