google-chrome-extension ffmpeg aac webm web-mediarecorder

when webm extracts aac, the duration is inconsistent

When I try to extract aac from webm, there will be inconsistencies in duration. aac is ten minutes less. Different webm videos, the gap is not the same.

webm video is generated by chrome extension chrome.tabCapture.capture

code:

chrome.tabCapture.capture({
  video: true,
  audio: true,
  videoConstraints: {
    mandatory: {
      minWidth: 1920,
      minHeight: 1080,
      maxWidth: 1920,
      maxHeight: 1080,
      maxFrameRate: 30,
      minFrameRate: 30,
    }
  }
})

The above code will return a stream, I will use JS's MediaRecorder method to process this stream, and finally save it as a webm file.

code:

new MediaRecorder(stream, {
  audioBitsPerSecond: 128000,
  videoBitsPerSecond: 2500000,
  mimeType: 'video/webm;codecs=vp9'
})

If you don't know the meaning of the above code, it doesn't matter, I will explain the main information:

width: 1920
height: 1080
FPS: 30
audioBits: 128000
videoBits: 2500000
mimeType: video/webm;codecs=vp9

I tried a lot of methods, like the following:

# 1
ffmpeg -i ./source.webm -y -fflags +genpts -max_muxing_queue_size 99999 -r 15 -crf 30 -filter:v crop=750:560:0:0 ./x.mp4
ffmpeg -i ./x.mp4 -y -vn -acodec libfdk_aac -b:a 200k ./x.aac

# 2
ffmpeg -i ./source.webm -y -vn -acodec libfdk_aac -b:a 200k ./x.aac

# 3
ffmpeg -i ./source.webm -y -vn -acodec libfdk_aac -b:a 200k -map 0 ./x.aac

# 4
ffmpeg -i ./source.webm -y -max_muxing_queue_size 99999 -r 15 -crf 30 -filter:v crop=750:560:0:0 ./x.mp4
ffmpeg -i ./source.webm -y -vn -acodec aac -b:a 200k ./x.aac

# etc.

But without exception, all failed. I have been plagued by this problem for 4 days.

webm file download url: https://drive.google.com/file/d/1m4fC1hU-tXFPOZayrYCs-yteSTxw_TaW/view?usp=sharing

Solution

What many conferencing or web recording apps do, is not store silence when audio input is missing or silent (as defined by some volume threshold). WebM and MP4 are time-indexed containers and so the media data has the correct timestamps for playback or editing purposes. .mp3 or .aac don't, so without timestamps, the duration is that of the actual amount of audio recorded and stored. An additional issue is that the duration you see with ffmpeg -i in.aac is an estimate based on the file size and the notional bitrate. For a VBR stream, this estimate can be wrong.

Either store and work with the audio in a container with timestamps, like MP4, MKV..etc

ffmpeg -i ./x.mp4 -y -vn -acodec libfdk_aac -b:a 200k ./x.mp4

or plug in the timestamp gaps with audio silence,

ffmpeg -i ./x.mp4 -y -vn -af aresample=async=1:first_pts=0:min_hard_comp=0.01 -acodec libfdk_aac -b:a 200k ./x.aac

This latter command may still show the wrong estimated duration but an editor, after generating peaks, will show the correct duration.