FFMPEG - Concat 3 videos with one of the videos becoming a picture in picture overlay

I have been getting to grips with FFMPEG for the last few days...so please excuse my lack of knowledge. It's very much early days.

I need to join 3 video elements together with one of the videos becoming an overlay at a specific time.

intro.mp4

mainvideo.mp4

endboard.mp4

I need the intro.mp4 to bolt on to the front of the mainvideo.mp4 and then ideally with 20 seconds to go before the end of the mainvideo.mp4, I need the endboard.mp4 video to be bolted on to the sequence and take over the frame. When this happens, I then need the mainvideo.mp4 to be overlayed in the top left corner and continue playing seamlessly through the transition.

I also need the audio from the main video to play until the end of the video.

I currently achieve this but putting all of the video elements into Premiere and exporting them out but I know this process can be much quicker with FFMPEG. For reference, here is an example of how it looks. If you skip to the end of the video below (just after 45 mins into the video) as the credits are rolling you will see the main video transition to the picture in picture overlay, and the endboard video take over the main frame.

https://www.youtube.com/watch?v=RtgIvWxZUwM&t=2723s

There will be lots of mainvideo.mp4 files that this will be applied to individually, and the lengths of these videos will always be different. I am hoping that there is a way to have the transition to the endboard.mp4 happen relative to 20secs before the end of the files. If not I guess I would have to manually input the time I want this change over transition to happen.

I roughly understand in theory what needs to be done, but being so new to this world I am really unsure of how something this complicated would be pieced together.

If there is anyone out there that can help me , it would be greatly appreciated!

I have got my head around the process of merging videos together with a simple concat command and I can see that overlaying a video in the top left corner of the frame is also possible...but my brain cannot figure out the sequence of events that needs to happen to bolt the intro video on to the main video....and then have the main video transition into the picture in picture overlay video at a specific time, while also bolting on the endboard video for the main video to overlay onto.

Any help for a complete newb would be so unbelievably appreciated!

Solution

Safe way (one ffmpeg run, full video reencoding)

There are different ways to do it, but I think the straitforward one is to split mainvideo into two parts, resize the second part and overlay it onto endboard. Then concat intro, the first part of mainvideo, PIP endboard together and pack it with concatenated audio from intro and mainvideo. Since the duration of the mainvideo may vary, your script should detect it to define trim point. ffmpeg can trim from the end with the special seek option, but in this case you get intermediate file. This approach is for to do all the job without intermediate files:

#!/bin/bash

mainvideo=mainvideo.mp4
tailtime=20
duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 $mainvideo)

ffmpeg -hide_banner -y \
  -i intro.mp4 \
  -i $mainvideo \
  -i endboard.mp4 \
  -filter_complex \
    "[0:v]setpts=PTS-STARTPTS[v_intro]; \
     [0:a]asetpts=PTS-STARTPTS[a_intro]; \
     [1:v]setpts=PTS-STARTPTS,split[v_main1][v_main2]; \
     [1:a]asetpts=PTS-STARTPTS[a_main]; \
     [2:v]setpts=PTS-STARTPTS[v_endboard]; \
     [v_main1]select='gt(t,$duration-$tailtime)',scale=w=iw/2:h=ih/2,setpts=PTS-STARTPTS[v_tail]; \
     [v_endboard][v_tail]overlay[v_pip]; \
     [v_main2]select='lte(t,$duration-$tailtime)',setpts=PTS-STARTPTS[v_mid]; \
     [v_intro][v_mid][v_pip]concat=n=3:v=1:a=0[v_out]; \
     [a_intro][a_main]concat=n=2:v=0:a=1[a_out]" \
  -map "[v_out]" \
  -map "[a_out]" \
  -r 25 \
  output.mp4

Semi-safe way (few ffmpeg runs, tail encoding, zsh)

To avoid full video re-encoding you may use something like this:

setopt interactivecomments

# Input parameters
mainvideo=mainvideo.mp4
endboard=endboard.mp4
intro=intro.mp4
tailtime=20

# Time calculations to define cut point
duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 $mainvideo)
midtime=$(echo "scale=2;$duration-$tailtime" | bc)

# Safety check
tbn_main=$(ffprobe -v error -select_streams v -show_entries stream=time_base -of default=noprint_wrappers=1:nokey=1 $mainvideo)
tbn_main=${tbn_main#*/}
tbn_intro=$(ffprobe -v error -select_streams v -show_entries stream=time_base -of default=noprint_wrappers=1:nokey=1 $intro)
tbn_intro=${tbn_intro#*/}
tbn_end=$(ffprobe -v error -select_streams v -show_entries stream=time_base -of default=noprint_wrappers=1:nokey=1 $endboard)
tbn_end=${tbn_end#*/}

if [[ $(( ($tbn_intro+$tbn_main+$tbn_end)/3 )) -ne $tbn_main ]]; then
  echo "WARNING: source video files have the different timebase."
  echo "The use of the concat demuxer will produce incorrect output."
  echo "Re-encoding is highly recommended."
  read -s -k $'?Press any key to exit.\n'
  exit 1
fi

# Trim the main part of mainvideo
ffmpeg -hide_banner -y -i $mainvideo -to $midtime -c copy mid.mp4

# Trim the tail of mainvideo and overlay it onto endboard
ffmpeg -hide_banner -y \
  -i $mainvideo \
  -i $endboard \
  -filter_complex \
    "[0:v]select='gt(t,$duration-$tailtime)',scale=w=iw/2:h=ih/2,setpts=PTS-STARTPTS[v_tail]; \
     [0:a]aselect='gt(t,$duration-$tailtime)',asetpts=PTS-STARTPTS[a_out]; \
     [1:v][v_tail]overlay=format=auto[v_out]" \
  -map "[v_out]" \
  -map "[a_out]" \
  -video_track_timescale $tbn_main \
  pip.mp4

# Pass all parts through the concat demuxer
[ -f filelist.txt ] && rm filelist.txt
for f in $intro mid.mp4 pip.mp4; do echo "file '$PWD/$f'" >> filelist.txt; done
ffmpeg -hide_banner -y -f concat -safe 0 -i filelist.txt -c copy output.mp4

# Sweep the table
rm mid.mp4 pip.mp4 filelist.txt

I've included timebase check of source video streams to warn about the unsuitability of the concat demuxer method. If you ignore this warning, most likely you'll get an incorrect concatenation result and a lot of ffmpeg's warnings "Non-monotonous DTS in output stream...". For the same reason I've added the video_track_timescale option to the command that generates pip.mp4.

You can use both methods (full re-encoding and partial) if you wish, using the if-then-else from the second method as a wrapper.