Search code examples
c#azure-media-services

Azure Media Services V3 audio analyzer transcript timestamp incorrect


The story: The timestamp of the generated transcript file seems to be a bit delayed with the image played on the video. We are expecting the caption text visible right at the moment that the sound is being played (e.g. syncing with the lips of the person who's talking) But the transcript content always appeared a bit later. I would like to know if there is a way that I could tell the service that the timestamp should be a bit sooner through the API (https://learn.microsoft.com/en-us/rest/api/media/transforms/create-or-update?tabs=HTTP#audioanalyzerpreset - maybe the "experimentalOptions" could do the trick? Thank you.

I'm following this document https://learn.microsoft.com/en-us/rest/api/media/transforms/create-or-update?tabs=HTTP#audioanalyzerpreset, but nothing useful in my case.

Here is the example I'm working with:

  • The content of the transcript file I got from the Media Service - Audio Analyzer

WEBVTT

NOTE duration:"00:00:48"

NOTE recognizability:0.886

NOTE language:en-us

NOTE Confidence: 0.9478216

00:00:00.000 --> 00:00:01.680 You know, our mission at

NOTE Confidence: 0.9478216

00:00:01.680 --> 00:00:03.360 Microsoft is to empower every

NOTE Confidence: 0.9478216

00:00:03.360 --> 00:00:04.572 person and every organization

NOTE Confidence: 0.9478216

00:00:04.572 --> 00:00:06.390 on the planet to be able

NOTE Confidence: 0.768422245

00:00:06.390 --> 00:00:08.060 to achieve more. Empowerment is

  • As you can see, a single sentence is divided into multiple timestamps. This will cause some trouble for viewers watching the captions because what happened in the video is a bit faster than the caption shows. I made a gif for better understanding https://imgur.com/KBCnaUp .
  • So my question is: How could I solve this problem programmatically? Should I edit the timestamps? Or should I edit both of the content and timestamps?

Solution

  • There is no API provided for modifying the timestamps on the VTT. However, you could create your own processing on the VTT to modify the timestamps per your requirement.

    We do not have a sample for caption reprocessing, just to let you know- there are online caption editors available that can accomplish this. Another possibility is using a text editor, although there are specific editors tailored for VTT files.

    I have relayed this feedback internally to our product engineering team.