Search code examples
speech-to-textazure-cognitive-servicesazure-speech

How to use Webhook for Microsoft cognitive Speech to Text V3


I'm trying to understand how to use a webhook in microsoft speech to text V3. According to the docs i was able to create a webhook and ping it. Now whenever a transcription is complete the webhook is called, but the body object in request is empty always and is pretty much of no use. Can anyone tell me what am i doing wrong ?


Solution

  • The body of the request, that you receive, should contain a content similar to this one

    {
      "self": "https://{CognitiveServicesEndpoint}/speechtotext/v3.0/transcriptions/{TranscriptionId}",
      "invocationId": "{InvocationId}"
    }
    

    You can do an HTTP GET on the self property of the body to get details on the entity. These are deliberately not included, due to possible trust concerns between the web hook receiver and the subscription owner.

    Also, there should be a header in the request named X-MicrosoftSpeechServices-Event. It should contain the state of the transcription as one of the following, depending on what you subscribed for

    • TranscriptionCreation
    • TranscriptionProcessing
    • TranscriptionCompletion
    • TranscriptionDeletion

    I just created a web hook with all the above transcription event types and I received the expected requests with the expected payload in the body. If you do not see the correct payload in the body, please let me know, which endpoint (region) you are using, so I can check that one specifically. There might be a bug in that specific datacenter.

    Kind regards

    Dirk