Azure Speech SDK Speech-to-Text to Stream Audio Segments

I have been working with Azure's Speech-To-Text service found here, using the recognize from in-memory stream method. Essentially what I plan to do is stream only certain segments of the audio to the services, but I am not entirely sure on how to do so. Say I have a video of length 5 minutes and my goal is to only stream the first 30 seconds or even just from the 1 minute mark to the 3 minute mark in the audio file, what would I need to enable or change in the following code to do so?

I have attempted to use CreatePullStream() instead of CreatePushStream() providing the mark in seconds, but it did not produce the goal that I have described above. If anyone knows, please let me know how I can achieve this, much thanks!

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;

class Program 
{
    async static Task FromStream(SpeechConfig speechConfig)
    {
        var reader = new BinaryReader(File.OpenRead("audioFile.wav"));
        using var audioInputStream = AudioInputStream.CreatePushStream();
        using var audioConfig = AudioConfig.FromStreamInput(audioInputStream);
        using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        byte[] readBytes;
        do
        {
            readBytes = reader.ReadBytes(1024);
            audioInputStream.Write(readBytes, readBytes.Length);
        } while (readBytes.Length > 0);

        var result = await recognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    }

    async static Task Main(string[] args)
    {
        var speechConfig = SpeechConfig.FromSubscription("<paste-your-subscription-key>", "<paste-your-region>");
        await FromStream(speechConfig);
    }
}

Solution

You can just use NAudio.Wave to cut your source .wav files. For instance, if you want to recognize 1 min - 3 min content of a .wav file, try code below:

using System;
using System.IO;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using NAudio.Wave;

public class Program
{
    public async static Task FromStream(SpeechConfig speechConfig)
    {
        var inputAudioPath = @"<path>";
        var outputAudioPath = @"<path>";
        var startAt = new TimeSpan(0, 1, 0); //start at 1 min
        var duration = new TimeSpan(0, 2, 0); //cut 1-3 min audio, it lasts 2 mins

        CutAudio(inputAudioPath, outputAudioPath, startAt, duration);

        var reader = new BinaryReader(File.OpenRead(outputAudioPath));
        var audioInputStream = AudioInputStream.CreatePushStream();
        var audioConfig = AudioConfig.FromStreamInput(audioInputStream);
        var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

        byte[] readBytes;
        do
        {
            readBytes = reader.ReadBytes(1024);
            audioInputStream.Write(readBytes, readBytes.Length);
        } while (readBytes.Length > 0);

        var result = await recognizer.RecognizeOnceAsync();
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
    }


    public static void CutAudio(String inputPath, String destPath, TimeSpan startAt, TimeSpan duration)
    {
        using (var reader = new AudioFileReader(inputPath))
        {
            reader.CurrentTime = startAt; // jump forward to the position we want to start from
            WaveFileWriter.CreateWaveFile16(destPath, reader.Take(duration));
        }
    }

    public async static Task Main(string[] args)
    {
        var speechConfig = SpeechConfig.FromSubscription("<key>", "<region>");
        await FromStream(speechConfig);
    }
}

Result:

Btw, if you want to recognize long audios, pls see this official doc and my previous post here.