When extracting a part of a video with ffmpeg, without reencoding, the output is choppy

I extract the part 00:02:00.000 to 00:03:30.000 of a video with ffmpeg:

ffmpeg -i input.mp4 -ss 120 -t 90 -vcodec copy -acodec copy output.mp4

It works but the output video is jerky/choppy, i.e. not really smooth/fluid. Why?

The only log I have is:

[mp4 @ 000000000041c700] track 1: codec frame size is not set
[mp4 @ 000000000041c700] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
[mp4 @ 000000000041c700] pts has no value
     Last message repeated 7894 times

But when watching the original video and navigating to 00:02:00.000, the video is perfectly fluid/smooth. How to fix this?

Solution

A video stream consists of chunks of video, a few seconds at a time, or a few frames at a time. At the start of each chunk is a keyframe, which contains the entire frame, and then the subsequent frame data consists of changes to the previous image. Whenever the scene changes or there is too much of a data change between frames, a new keyframe is generated and then subsequent frames are just data changes again.

When you use -c:v copy -ss 123.45, chances are the timestamp you're specifying doesn't coincide with a keyframe, and so your resultant video will assume a black keyframe and the subsequent frame changes will not make much sense, hence giving a choppy output.

The fix for this is to either:
(1) transcode or recode instead of copy
(2) identify where the keyframes are and align your start timestamp to the keyframe. You can write a script to detect the keyframes and set your timestamps from there, but afaik it can't be done in one command.

for example, you could use something like: ffprobe -i INPUT.mp4 -select_streams v -show_entries frame=pts_time -of csv=p=0 -skip_frame nokey > frames.txt

(3) A third option occurs to me that you could:

fetch the timing of the first keyframe after your -ss timestamp
recode the portion from your timestamp up to the keyframe as one video
copy the portion from the keyframe as another video
concatenate the videos together

This would ensure that you use the minimum of CPU/GPU time and preserve as much quality as possible.