Search code examples
videoffmpegmp3mp4concatenation

ffmpeg - merge mp3 and mp4 (duration difference)


I'm trying to merge mp4 and mp3 files with ffmpeg. mp4 duration - 9.800 sec, mp3 - 58.540 sec. So i using -shortest key. Code:

ffmpeg -i video.mp4 -i audio.mp3 -c:v libx264 -c:a aac -strict experimental -shortest output.mp4

After that i got output.mp4 with duration 9.846. Where is my error? Why output video longer than source? (9.846 sec and 9.800 sec).

Source mp4 MediaInfo:

General
Complete name                  : F:\video test\video.mp4
Format                         : MPEG-4
Format profile                 : Base Media
Codec ID                       : iso5 (iso5/dash)
File size                      : 3.19 MiB
Duration                       : 9 s 800 ms
Overall bit rate               : 2 732 kb/s
Encoded date                   : UTC 2017-11-24 20:53:53
Tagged date                    : UTC 2017-11-24 20:53:53

Video
ID                             : 1
Format                         : AVC
Format/Info                    : Advanced Video Codec
Format profile                 : [email protected]
Format settings                : CABAC / 4 Ref Frames
Format settings, CABAC         : Yes
Format settings, ReFrames      : 4 frames
Codec ID                       : avc1
Codec ID/Info                  : Advanced Video Coding
Duration                       : 9 s 800 ms
Bit rate                       : 2 729 kb/s
Maximum bit rate               : 3 766 kb/s
Width                          : 1 280 pixels
Height                         : 720 pixels
Display aspect ratio           : 16:9
Frame rate mode                : Constant
Frame rate                     : 25.000 FPS
Color space                    : YUV
Chroma subsampling             : 4:2:0
Bit depth                      : 8 bits
Scan type                      : Progressive
Bits/(Pixel*Frame)             : 0.118
Stream size                    : 3.19 MiB (100%)
Writing library                : x264 core 146
Encoding settings              : cabac=1 / ref=3 / deblock=1:0:0 / analyse=0x3:0x113 / me=hex / subme=7 / psy=1 / psy_rd=1.00:0.00 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=1 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-2 / threads=12 / lookahead_threads=2 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=3 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=250 / keyint_min=25 / scenecut=40 / intra_refresh=0 / rc_lookahead=40 / rc=crf / mbtree=1 / crf=23.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / ip_ratio=1.40 / aq=1:1.00
Tagged date                    : UTC 2017-11-24 20:53:53

Source mp3 Mediainfo:

General
Complete name                  : F:\video test\audio.mp3
Format                         : MPEG Audio
File size                      : 1.19 MiB
Duration                       : 58 s 540 ms
Overall bit rate mode          : Variable
Overall bit rate               : 170 kb/s
Writing library                : LAME3.99r

Audio
Format                         : MPEG Audio
Format version                 : Version 1
Format profile                 : Layer 3
Format settings                : Joint stereo / MS Stereo
Duration                       : 58 s 540 ms
Bit rate mode                  : Variable
Bit rate                       : 170 kb/s
Minimum bit rate               : 32.0 kb/s
Channel(s)                     : 2 channels
Sampling rate                  : 44.1 kHz
Frame rate                     : 38.281 FPS (1152 SPF)
Compression mode               : Lossy
Stream size                    : 1.19 MiB (100%)
Writing library                : LAME3.99r
Encoding settings              : -m j -V 2 -q 0 -lowpass 18.5 --vbr-new -b 32

Console output:

ffmpeg version 3.4 Copyright (c) 2000-2017 the FFmpeg developers
  built with gcc 7.2.0 (GCC)
  configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-bzlib --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-cuda --enable-cuvid --enable-d3d11va --enable-nvenc --enable-dxva2 --enable-avisynth --enable-libmfx
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'video.mp4':
  Metadata:
    major_brand     : iso5
    minor_version   : 1
    compatible_brands: iso5dash
    creation_time   : 2017-11-24T20:53:53.000000Z
  Duration: 00:00:09.80, start: 0.000000, bitrate: 2732 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 2259 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
    Metadata:
      handler_name    : VideoHandler
Input #1, mp3, from 'audio.mp3':
  Duration: 00:00:58.54, start: 0.025057, bitrate: 170 kb/s
    Stream #1:0: Audio: mp3, 44100 Hz, stereo, s16p, 170 kb/s
    Metadata:
      encoder         : LAME3.99r
    Side data:
      replaygain: track gain - -2.200000, track peak - unknown, album gain - unknown, album peak - unknown, 
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
  Stream #1:0 -> #0:1 (mp3 (native) -> aac (native))
Press [q] to stop, [?] for help
[libx264 @ 00000000005ab440] using SAR=1/1
[libx264 @ 00000000005ab440] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
[libx264 @ 00000000005ab440] profile High, level 3.1
[libx264 @ 00000000005ab440] 264 - core 152 r2851 ba24899 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4':
  Metadata:
    major_brand     : iso5
    minor_version   : 1
    compatible_brands: iso5dash
    encoder         : Lavf57.83.100
    Stream #0:0(und): Video: h264 (libx264) (avc1 / 0x31637661), yuv420p(progressive), 1280x720 [SAR 1:1 DAR 16:9], q=-1--1, 25 fps, 12800 tbn, 25 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      encoder         : Lavc57.107.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc57.107.100 aac
    Side data:
      replaygain: track gain - -2.200000, track peak - unknown, album gain - unknown, album peak - unknown, 
frame=   54 fps=0.0 q=28.0 size=       0kB time=00:00:00.04 bitrate=   8.3kbits/s speed=0.0927x    
frame=   80 fps= 80 q=28.0 size=       0kB time=00:00:01.09 bitrate=   0.4kbits/s speed=1.09x    
frame=   98 fps= 65 q=28.0 size=     256kB time=00:00:01.83 bitrate=1143.5kbits/s speed=1.21x    
frame=  119 fps= 59 q=28.0 size=     512kB time=00:00:02.67 bitrate=1570.9kbits/s speed=1.32x    
frame=  144 fps= 56 q=28.0 size=     768kB time=00:00:03.66 bitrate=1715.0kbits/s speed=1.42x    
frame=  167 fps= 52 q=28.0 size=    1024kB time=00:00:04.57 bitrate=1833.9kbits/s speed=1.44x    
frame=  190 fps= 51 q=28.0 size=    1280kB time=00:00:05.50 bitrate=1905.5kbits/s speed=1.47x    
frame=  218 fps= 51 q=28.0 size=    1792kB time=00:00:06.64 bitrate=2210.6kbits/s speed=1.56x    
frame=  242 fps= 50 q=28.0 size=    2048kB time=00:00:07.56 bitrate=2216.4kbits/s speed=1.58x    
frame=  245 fps= 41 q=-1.0 Lsize=    3045kB time=00:00:09.82 bitrate=2539.6kbits/s speed=1.65x    
video:2880kB audio:156kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.298058%
[libx264 @ 00000000005ab440] frame I:14    Avg QP:20.01  size: 39750
[libx264 @ 00000000005ab440] frame P:106   Avg QP:23.85  size: 14578
[libx264 @ 00000000005ab440] frame B:125   Avg QP:24.63  size:  6770
[libx264 @ 00000000005ab440] consecutive B-frames: 22.9% 22.0% 15.9% 39.2%
[libx264 @ 00000000005ab440] mb I  I16..4: 16.7% 80.3%  3.0%
[libx264 @ 00000000005ab440] mb P  I16..4: 10.2% 36.2%  1.1%  P16..4: 25.0%  7.9%  2.5%  0.0%  0.0%    skip:17.1%
[libx264 @ 00000000005ab440] mb B  I16..4:  2.3%  5.8%  0.2%  B16..8: 31.4%  6.5%  0.9%  direct: 3.7%  skip:49.2%  L0:51.8% L1:44.5% BI: 3.7%
[libx264 @ 00000000005ab440] 8x8 transform intra:76.1% inter:86.3%
[libx264 @ 00000000005ab440] coded y,uvDC,uvAC intra: 38.3% 52.1% 9.0% inter: 12.3% 20.1% 0.2%
[libx264 @ 00000000005ab440] i16 v,h,dc,p: 30% 28%  9% 33%
[libx264 @ 00000000005ab440] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 36% 23% 19%  3%  3%  4%  4%  4%  4%
[libx264 @ 00000000005ab440] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 33% 21% 14%  5%  7%  7%  6%  5%  3%
[libx264 @ 00000000005ab440] i8c dc,h,v,p: 45% 24% 25%  6%
[libx264 @ 00000000005ab440] Weighted P-Frames: Y:13.2% UV:6.6%
[libx264 @ 00000000005ab440] ref P L0: 71.7% 12.5% 12.9%  2.7%  0.2%
[libx264 @ 00000000005ab440] ref B L0: 92.8%  6.3%  0.9%
[libx264 @ 00000000005ab440] ref B L1: 98.3%  1.7%
[libx264 @ 00000000005ab440] kb/s:2406.56
[aac @ 00000000005adde0] Qavg: 511.420

ffprobe -show_packets output too big, so I loaded to pastebin https://pastebin.com/TYSMdceS


Solution

  • A quick answer to your question is that FFmpeg / libaac encodes an extra aac priming packet at the beginning, starting at -0.0213 s. That adds to your duration. I will try to give a detailed answered later if that would help. You can try ffprobe -show_packets output.mp4.

    I looked into the packets dump you shared. You video packets looks like

    dts: -0.08 | pts: 0.0
    dts: -0.04 | pts: 0.12
    dts:  0.0  | pts: 0.04
    dts:  0.04 | pts: 0.08
    dts:  0.08 | pts: 0.24
    ...
    dts:  9.64 | pts: 9.76
    dts:  9.68 | pts: 9.72
    

    The back and forth pts values are possibly because u have B frames with I B B P order. Your video stream is 25 fps, which makes 1 frame duration = 0.04 s. That makes your video 9.76 + 0.04(frame duration) = 9.8 s.

    You original audio is larger than the video, so it would be truncated to have the last packet up to 9.80 s or later. Your audio packets look like

    pts: -0.023220 (AAC priming data)
    pts:  0.0
    pts:  0.023220
    ...
    pts:  9.775601 | duration: 0.023220
    pts:  9.798821 | duration: 0.023175
    

    You last audio packet has to end at 9.80 or after. That's why the packet at 9.79 is accepted. So your duration of audio muxed into the AV stream is 0.02322 (primiing pkt) + 9.798821 + 0.023175 (dur) = 9.845216

    I am not sure where the extra 0.001 s comes from. Someone else should be able to comment. There's skip data I see at the beginning.

    [SIDE_DATA]
    side_data_type=Skip Samples
    skip_samples=1024
    discard_padding=0
    skip_reason=0
    discard_reason=0
    [/SIDE_DATA]
    

    I hope this helps.