I am trying to "weave" PNG image files with a WAV file. Here is the ffmpeg
command that I used:
'/opt/homebrew/Cellar/ffmpeg/5.1.2_6/bin/ffmpeg' -y -f concat -safe 0 -i '/private/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T/Rtmp8gnKo7/ari_input_NQIg95oycVnb.txt' -i '/private/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T/Rtmp8gnKo7/ari_audio_ngNm5vr2vFta.wav' -c:v libx264 -c:a aac -ac 2 -shortest -fps_mode auto -pix_fmt yuv420p -vf fps=5,\"scale=trunc(iw/2)*2:trunc(ih/2)*2\" -strict experimental -max_muxing_queue_size 9999 -threads 2 '/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T//Rtmp8gnKo7/file46521ffd4ef8.mp4'
While ffmpeg
is doing its thing, I get this long message:
ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
built with Apple clang version 14.0.0 (clang-1400.0.29.202)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.1.2_6 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
libavutil 57. 28.100 / 57. 28.100
libavcodec 59. 37.100 / 59. 37.100
libavformat 59. 27.100 / 59. 27.100
libavdevice 59. 7.100 / 59. 7.100
libavfilter 8. 44.100 / 8. 44.100
libswscale 6. 7.100 / 6. 7.100
libswresample 4. 7.100 / 4. 7.100
libpostproc 56. 6.100 / 56. 6.100
Input #0, concat, from '/private/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T/Rtmp8gnKo7/ari_input_NQIg95oycVnb.txt':
Duration: 00:00:37.00, start: 0.000000, bitrate: 0 kb/s
Stream #0:0: Video: png, rgb24(pc), 6000x3375 [SAR 23622:23622 DAR 16:9], 25 fps, 25 tbr, 25 tbn
Input #1, wav, from '/private/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T/Rtmp8gnKo7/ari_audio_ngNm5vr2vFta.wav':
Duration: 00:00:37.00, bitrate: 352 kb/s
Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, 1 channels (FL), s16, 352 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[libx264 @ 0x13c9084e0] using SAR=3374/3375
[libx264 @ 0x13c9084e0] using cpu capabilities: ARMv8 NEON
[libx264 @ 0x13c9084e0] profile High, level 6.0, 4:2:0, 8-bit
[libx264 @ 0x13c9084e0] 264 - core 164 r3095 baee400 - H.264/MPEG-4 AVC codec - Copyleft 2003-2022 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=2 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=5 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to '/var/folders/bb/m2b0ry595ys7bfs1r397lnf40000gp/T//Rtmp8gnKo7/file46521ffd4ef8.mp4':
Metadata:
encoder : Lavf59.27.100
Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 6000x3374 [SAR 3374:3375 DAR 16:9], q=2-31, 5 fps, 10240 tbn
Metadata:
encoder : Lavc59.37.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 22050 Hz, stereo, fltp, 128 kb/s
Metadata:
encoder : Lavc59.37.100 aac
frame= 150 fps= 11 q=-1.0 Lsize= 658kB time=00:00:30.00 bitrate= 179.7kbits/s speed=2.21x
video:286kB audio:364kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.219465%
[libx264 @ 0x13c9084e0] frame I:2 Avg QP: 2.27 size: 77096
[libx264 @ 0x13c9084e0] frame P:38 Avg QP:12.33 size: 1831
[libx264 @ 0x13c9084e0] frame B:110 Avg QP:12.66 size: 622
[libx264 @ 0x13c9084e0] consecutive B-frames: 2.0% 0.0% 2.0% 96.0%
[libx264 @ 0x13c9084e0] mb I I16..4: 95.0% 2.4% 2.5%
[libx264 @ 0x13c9084e0] mb P I16..4: 0.0% 0.0% 0.0% P16..4: 0.1% 0.0% 0.0% 0.0% 0.0% skip:99.9%
[libx264 @ 0x13c9084e0] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 0.1% 0.0% 0.0% direct: 0.0% skip:99.9% L0:60.3% L1:39.7% BI: 0.0%
[libx264 @ 0x13c9084e0] 8x8 transform intra:3.0% inter:3.7%
[libx264 @ 0x13c9084e0] coded y,uvDC,uvAC intra: 1.7% 0.4% 0.4% inter: 0.0% 0.0% 0.0%
[libx264 @ 0x13c9084e0] i16 v,h,dc,p: 99% 1% 1% 0%
[libx264 @ 0x13c9084e0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 79% 2% 19% 0% 0% 0% 0% 0% 0%
[libx264 @ 0x13c9084e0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 47% 16% 24% 2% 2% 3% 3% 2% 2%
[libx264 @ 0x13c9084e0] i8c dc,h,v,p: 99% 1% 0% 0%
[libx264 @ 0x13c9084e0] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x13c9084e0] ref P L0: 87.6% 1.9% 9.4% 1.1%
[libx264 @ 0x13c9084e0] ref B L0: 56.8% 41.9% 1.3%
[libx264 @ 0x13c9084e0] ref B L1: 97.9% 2.1%
[libx264 @ 0x13c9084e0] kb/s:77.92
[aac @ 0x13c9098c0] Qavg: 60691.250
The output, a mp4 video file, isn't what I expected. I expected 6 PNG images embedded with my combined WAV file, but the video only shows 5 PNG images and a shortened WAV file.
Here is the content of the txt
file, ari_input_NQIg95oycVnb.txt
:
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide1.png'
duration 6
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide2.png'
duration 6
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide3.png'
duration 6
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide4.png'
duration 6
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide5.png'
duration 6
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide6.png'
duration 7
I wanted to attach my WAV file and PNG files to make this question more reproducible, but couldn't so I'll redirect you to my GitHub issue, which contains 6 separate WAV files and 6 PNG files.
The duration
directive in the concat demuxer is not intuitively designed
It actually offsets the start time of the next entry relative to the current entry. So the last entry here will be shown for one frame length. When duration
is meant to be applied to the last image, duplicate that last entry i.e.
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide6.png'
duration 7
file '/Users/howardbaek/Documents/sandbox/loqui/png_files/slide6.png'
The audio is shortened because you have -shortest
declared and video terminates at 30s.