Search code examples
rffmpegpngmp4wav

Incomplete video when using ffmpeg to weave WAV and PNG


I am trying to "weave" PNG image files with a WAV file. Here is the ffmpeg command that I used:

'/opt/homebrew/Cellar/ffmpeg/5.1.2_6/bin/ffmpeg' -y -f concat -safe 0 -i '/private/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T/Rtmp8gnKo7/ari_input_NQIg95oycVnb.txt' -i '/private/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T/Rtmp8gnKo7/ari_audio_ngNm5vr2vFta.wav' -c:v libx264 -c:a aac -ac 2    -shortest -fps_mode auto -pix_fmt yuv420p  -vf fps=5,\"scale=trunc(iw/2)*2:trunc(ih/2)*2\"   -strict experimental -max_muxing_queue_size 9999 -threads 2 '/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T//Rtmp8gnKo7/file46521ffd4ef8.mp4'

While ffmpeg is doing its thing, I get this long message:

ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
  built with Apple clang version 14.0.0 (clang-1400.0.29.202)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.1.2_6 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
Input #0, concat, from '/private/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T/Rtmp8gnKo7/ari_input_NQIg95oycVnb.txt':
  Duration: 00:00:37.00, start: 0.000000, bitrate: 0 kb/s
  Stream #0:0: Video: png, rgb24(pc), 6000x3375 [SAR 23622:23622 DAR 16:9], 25 fps, 25 tbr, 25 tbn
Input #1, wav, from '/private/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T/Rtmp8gnKo7/ari_audio_ngNm5vr2vFta.wav':
  Duration: 00:00:37.00, bitrate: 352 kb/s
  Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels (FL), s16, 352 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
  Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[libx264 @ 0x13c9084e0] using SAR=3374/3375
[libx264 @ 0x13c9084e0] using cpu capabilities: ARMv8 NEON
[libx264 @ 0x13c9084e0] profile High, level 6.0, 4:2:0, 8-bit
[libx264 @ 0x13c9084e0] 264 - core 164 r3095 baee400 - H.264/MPEG-4 AVC codec - Copyleft 2003-2022 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=2 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=5 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to '/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T//Rtmp8gnKo7/file46521ffd4ef8.mp4':
  Metadata:
    encoder         : Lavf59.27.100
  Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 6000x3374 [SAR 3374:3375 DAR 16:9], q=2-31, 5 fps, 10240 tbn
    Metadata:
      encoder         : Lavc59.37.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
  Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 22050 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc59.37.100 aac
frame=  150 fps= 11 q=-1.0 Lsize=     658kB time=00:00:30.00 bitrate= 179.7kbits/s speed=2.21x    
video:286kB audio:364kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.219465%
[libx264 @ 0x13c9084e0] frame I:2     Avg QP: 2.27  size: 77096
[libx264 @ 0x13c9084e0] frame P:38    Avg QP:12.33  size:  1831
[libx264 @ 0x13c9084e0] frame B:110   Avg QP:12.66  size:   622
[libx264 @ 0x13c9084e0] consecutive B-frames:  2.0%  0.0%  2.0% 96.0%
[libx264 @ 0x13c9084e0] mb I  I16..4: 95.0%  2.4%  2.5%
[libx264 @ 0x13c9084e0] mb P  I16..4:  0.0%  0.0%  0.0%  P16..4:  0.1%  0.0%  0.0%  0.0%  0.0%    skip:99.9%
[libx264 @ 0x13c9084e0] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8:  0.1%  0.0%  0.0%  direct: 0.0%  skip:99.9%  L0:60.3% L1:39.7% BI: 0.0%
[libx264 @ 0x13c9084e0] 8x8 transform intra:3.0% inter:3.7%
[libx264 @ 0x13c9084e0] coded y,uvDC,uvAC intra: 1.7% 0.4% 0.4% inter: 0.0% 0.0% 0.0%
[libx264 @ 0x13c9084e0] i16 v,h,dc,p: 99%  1%  1%  0%
[libx264 @ 0x13c9084e0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 79%  2% 19%  0%  0%  0%  0%  0%  0%
[libx264 @ 0x13c9084e0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 47% 16% 24%  2%  2%  3%  3%  2%  2%
[libx264 @ 0x13c9084e0] i8c dc,h,v,p: 99%  1%  0%  0%
[libx264 @ 0x13c9084e0] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x13c9084e0] ref P L0: 87.6%  1.9%  9.4%  1.1%
[libx264 @ 0x13c9084e0] ref B L0: 56.8% 41.9%  1.3%
[libx264 @ 0x13c9084e0] ref B L1: 97.9%  2.1%
[libx264 @ 0x13c9084e0] kb/s:77.92
[aac @ 0x13c9098c0] Qavg: 60691.250

The output, a mp4 video file, isn't what I expected. I expected 6 PNG images embedded with my combined WAV file, but the video only shows 5 PNG images and a shortened WAV file.

Here is the content of the txt file, ari_input_NQIg95oycVnb.txt:

file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide1.png'
duration 6
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide2.png'
duration 6
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide3.png'
duration 6
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide4.png'
duration 6
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide5.png'
duration 6
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide6.png'
duration 7

I wanted to attach my WAV file and PNG files to make this question more reproducible, but couldn't so I'll redirect you to my GitHub issue, which contains 6 separate WAV files and 6 PNG files.


Solution

  • The duration directive in the concat demuxer is not intuitively designed

    It actually offsets the start time of the next entry relative to the current entry. So the last entry here will be shown for one frame length. When duration is meant to be applied to the last image, duplicate that last entry i.e.

    file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide6.png'
    duration 7
    file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide6.png'
    

    The audio is shortened because you have -shortest declared and video terminates at 30s.