Search code examples
c#wpfazure-cognitive-services

Audio transcription with Azure


So I have an audio in WAV format https://youtu.be/nunJDdIlnnk but for some reason, I cannot get it to paste all the transcription in the Textbox, However in the method SpeechRecognizer_Recognizing I can see on my debug windows, that everything is being recognized as the woman speaks.

var config = SpeechConfig.FromSubscription(KEY, Region);
        MessageBox.Show("Ready to use speech service in " + config.Region);

        // Configure voice
        config.SpeechSynthesisVoiceName = "en-US-AriaNeural";

        // Configure speech recognition

        var taskCompleteionSource = new TaskCompletionSource<int>();

        using var audioConfig = AudioConfig.FromWavFileInput(FilePath);
        using var speechRecognizer = new SpeechRecognizer(config, audioConfig);
        speechRecognizer.Recognizing += SpeechRecognizer_Recognizing;
        speechRecognizer.Recognized += SpeechRecognizer_Recognized;
        speechRecognizer.SessionStarted += SpeechRecognizer_SessionStarted;
        speechRecognizer.SessionStopped += SpeechRecognizer_SessionStopped;

        await speechRecognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

        // Waits for completion.  
        // Use Task.WaitAny to keep the task rooted.  
        Task.WaitAny(new[] { taskCompleteionSource.Task });

        await speechRecognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);

    }

    private void SpeechRecognizer_SessionStopped(object? sender, SessionEventArgs e)
    {

        Debug.WriteLine("Stopped");

    }

    private void SpeechRecognizer_SessionStarted(object? sender, SessionEventArgs e)
    {
        Debug.WriteLine("Started");
    }

    private void SpeechRecognizer_Recognized(object? sender, SpeechRecognitionEventArgs e)
    {

        var transcriptionStringBuilder = new StringBuilder();
        if (e.Result.Reason == ResultReason.RecognizedSpeech)
        {
            transcriptionStringBuilder.Append(e.Result.Text);

            Text = transcriptionStringBuilder.ToString(); // This will be the text, that  is displayed on textbox

            IsVisible = Visibility.Visible;
        }
    }

    private void SpeechRecognizer_Recognizing(object? sender, SpeechRecognitionEventArgs e)
    {
        Debug.WriteLine("Recogizing: " + e.Result.Text);
    }

Output window

enter image description here

This is my XAML code for the UI

 <Border Background="#272537"
        CornerRadius="20">
    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height="75" />
            <RowDefinition />
        </Grid.RowDefinitions>

        <TextBlock Text="Transcribe Me"
                   Foreground="White"
                   Margin="10,0,0,0"
                   FontSize="23"
                   VerticalAlignment="Center" />

        <Grid Grid.Row="1">
            <Grid.ColumnDefinitions>
                <ColumnDefinition />
                <ColumnDefinition />
            </Grid.ColumnDefinitions>
            <Grid.RowDefinitions>
                <RowDefinition Height="*" />
                <RowDefinition />
                <RowDefinition Height="30" />
            </Grid.RowDefinitions>

            <syncfusion:SfHubTile Background="#FF1FAEFF">

                <behaviors:Interaction.Triggers>
                    <behaviors:EventTrigger EventName="Click">
                        <behaviors:InvokeCommandAction Command="{Binding AzureCommand}"
                                                       CommandParameter="Audio" />
                    </behaviors:EventTrigger>
                </behaviors:Interaction.Triggers>

                <iconPacks:PackIconMaterial Kind="Music" Width="120"
                                            Height="120"
                                            VerticalAlignment="Center"
                                            HorizontalAlignment="Center"
                                            Foreground="White" />

            </syncfusion:SfHubTile>

            <syncfusion:SfHubTile Grid.Column="1">
                <syncfusion:SfHubTile.Header>
                    <TextBlock Text="Document"
                               Foreground="White"
                               FontSize="18"
                               FontStyle="Oblique" />
                </syncfusion:SfHubTile.Header>
                <syncfusion:SfHubTile.Title>
                    <TextBlock Text="Translate"
                               Foreground="White"
                               FontSize="18"
                               HorizontalAlignment="Center"
                               FontStyle="Oblique" />
                </syncfusion:SfHubTile.Title>

                <behaviors:Interaction.Triggers>
                    <behaviors:EventTrigger EventName="Click">
                        <behaviors:InvokeCommandAction Command="{Binding AzureCommand}"
                                                       PassEventArgsToCommand="True"
                                                       CommandParameter="Document" />
                    </behaviors:EventTrigger>
                </behaviors:Interaction.Triggers>
            </syncfusion:SfHubTile>

            <TextBox Grid.Row="1"
                     IsReadOnly="True"
                     TextWrapping="Wrap"
                     Visibility="{Binding IsVisible}"
                     Text="{Binding Text, UpdateSourceTrigger=PropertyChanged}"
                     AcceptsReturn="True"
                     Grid.ColumnSpan="2"
                     Margin="10" />

            <Button Grid.Column="1"
                    Grid.Row="2"
                    Visibility="{Binding IsVisible}"
                    Content="Copy"
                    Command="{Binding CopyCommand}"
                    Foreground="White"
                    FontWeight="Black"
                    VerticalAlignment="Center"
                    Margin="0,0,10,10">
            </Button>
        </Grid>
    </Grid>
</Border>

When I run the code, my text box only has the last part of the of the video, witch is "Do I look alright"


Solution

  • The problem was, that I was puttingin my textbox every single sentence that she said.

    What you have to do, is to capture everything in the STOPPED method, and probably use a list of chars, loop thru every char, and crate a stringbulder, because using str += item, will create a string every time and is less efficient.