Search code examples
speech-recognitionspeech-to-textazure-cognitive-servicesbing-apimicrosoft-speech-api

Speech-to-text large audio files [Microsoft Speech API]


What is the best way to transcribe medium/large audio files, ~ 6-10 mins each file, using Microsoft Speech API? Something like batch audio files transcription?

I have used the code provided in https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-to-text-sample, for continuously transcribing speech, but it stops transcribing at some point. Is there any restriction on the transcription? I am only using the free trial account atm.

Btw, I assume there is no difference between Bing Speech API and the new Speech service API, right?

Thanks everyone!


Solution

  • thank you for your feedback.

    I agree the sample (and the documentation you are looking at) is not very clear, we will update this soon.

    The sample uses RecognizeAsync, and it should be call RecognizeOnceAsync. It is currently just trying to return the FIRST FinalResult from the service. You should use Start/StopRecognizeAsync, and register to receive Result events.

    Again, sorry for the bad documentation here, we will update this soon, and also will rename the API probably in a refresh.

    If you have audio files, you could also use the batch transcription feature. Perhaps that helps? https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription

    Cheers Wolfgang