I'm trying to generate and collect data using Azure's speech to text code. I want to generate timestamps, reduce redundancies in the output, and export to Excel. The code below runs with no errors:
using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
namespace NEST
internal class NewBaseType
static async Task Main(string[] args)
// Creates an instance of a speech config with specified subscription key and region.
// Replace with your own subscription key and service region (e.g., "westus").
var config = SpeechConfig.FromSubscription("subscriptionkey", "region");
// Generates timestamps
config.OutputFormat = OutputFormat.Detailed;
//calls the audio file
using (var audioInput = AudioConfig.FromWavFileInput("C:/Users/MichaelSchwartz/source/repos/AI-102-Process-Speech-master/transcribe_speech_to_text/media/narration.wav"))
// Creates a speech recognizer from microphone.
using (var recognizer = new SpeechRecognizer(config, audioInput))
// Subscribes to events.
recognizer.Recognizing += (s, e) =>
Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
recognizer.Recognized += (s, e) =>
var result = e.Result;
Console.WriteLine($"Reason: {result.Reason.ToString()}");
if (result.Reason == ResultReason.RecognizedSpeech)
Console.WriteLine($"Final result: Text: {result.Text}.");
recognizer.Canceled += (s, e) =>
Console.WriteLine($"\n Canceled. Reason: {e.Reason.ToString()}, CanceledReason: {e.Reason}");
recognizer.SessionStarted += (s, e) =>
Console.WriteLine("\n Session started event.");
recognizer.SessionStopped += (s, e) =>
Console.WriteLine("\n Session stopped event.");
recognizer.Recognized += (s, e) =>
var j = e.Result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult);
// Starts continuous recognition.
// Uses StopContinuousRecognitionAsync() to stop recognition.
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
Console.WriteLine("Press Enter to stop");
} while (Console.ReadKey().Key != ConsoleKey.Enter);
// Stops recognition.
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
When I run it, I don't see timestamp data. How do I generate timestamp data?
Also, is there a way to remove redundancies in the output? Example:
RECOGNIZING: Text=the speech
RECOGNIZING: Text=the speech translation
RECOGNIZING: Text=the speech translation API
RECOGNIZING: Text=the speech translation API transcribes
RECOGNIZING: Text=the speech translation API transcribes audio
I just want the final result. Is there a way to remove the "RECOGNIZING:" data from the output while preserving accuracy? Thanks in advance!
For removing the "RECOGNIZING:"
, just delete this sentence:
recognizer.Recognizing += (s, e) =>
Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
I didn't see where you export the result and timestamps to Excel. You could use this code after you got the SpeechRecognitionResult
var json = result.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult);