Search code examples
videoffmpegcodechevc

Inconsistent frame number with ffmpeg


I'm having regularly issue with hvc1 videos getting an inconsistent number of frames between ffprobe info and FFmpeg info, and I would like to know what could be the reason for this issue and how if it's possible to solve it without re-encoding the video.

I wrote the following sample script with a test video I have

I split the video into 5-sec segments and I get ffprobe giving the expected video length but FFmpeg gave 3 frames less than expected on every segment but the first one.

The issue is exactly the same if I split by 10 seconds or any split, I always lose 3 frames.

I noted that the first segment is always 3 frames smaller (on ffprobe) than the other ones and it's the only consistent one.

Here is an example script I wrote to test this issue :

# get total video frame number using ffprobe or ffmpeg
total_num_frames=$(ffprobe -v quiet -show_entries stream=nb_read_packets -count_packets -select_streams v:0 -print_format json test_video.mp4 | jq '.streams[0].nb_read_packets' | tr -d '"')
echo $total_num_frames
ffmpeg -hwaccel cuda -i test_video.mp4 -vsync 2 -f null -

# Check ffprobe of each segment is consistent 
rm -rf clips && mkdir clips && \
ffmpeg -i test_video.mp4 -acodec copy -f segment -vcodec copy -reset_timestamps 1 -segment_time 5 -map 0 clips/part_%d.mp4
count_frames=0
for i in {0..5}
do
    num_packets=$(ffprobe -v quiet -show_entries stream=nb_read_packets -count_packets -select_streams v:0 -print_format json clips/part_$i.mp4 | jq '.streams[0].nb_read_packets' | tr -d '"')
    count_frames=$(($count_frames+$num_packets))
    echo $num_packets $count_frames $total_num_frames
done

Output is the following

3597
ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
  configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test_video.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    encoder         : Lavf58.29.100
  Duration: 00:00:59.95, start: 0.035000, bitrate: 11797 kb/s
    Stream #0:0(und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv, bt709), 1920x1080, 11692 kb/s, 60.01 fps, 60 tbr, 19200 tbn, 19200 tbc (default)
    Metadata:
      handler_name    : Core Media Video
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 91 kb/s (default)
    Metadata:
      handler_name    : Core Media Audio
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> wrapped_avframe (native))
  Stream #0:1 -> #0:1 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, null, to 'pipe:':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2mp41
    encoder         : Lavf58.29.100
    Stream #0:0(und): Video: wrapped_avframe, nv12, 1920x1080, q=2-31, 200 kb/s, 60 fps, 60 tbn, 60 tbc (default)
    Metadata:
      handler_name    : Core Media Video
      encoder         : Lavc58.54.100 wrapped_avframe
    Stream #0:1(und): Audio: pcm_s16le, 44100 Hz, mono, s16, 705 kb/s (default)
    Metadata:
      handler_name    : Core Media Audio
      encoder         : Lavc58.54.100 pcm_s16le
frame= 3597 fps=788 q=-0.0 Lsize=N/A time=00:00:59.95 bitrate=N/A speed=13.1x    
video:1883kB audio:5162kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

then

297 297 3597
300 597 3597
300 897 3597
300 1197 3597
300 1497 3597
300 1797 3597 <--- output are consistent based on ffprobe

But then if i check segment size with ffmpeg with the following command

ffmpeg -hwaccel cuda -i clips/part_$i.mp4 -vsync 2 -f null - 

for part 0 its ok

frame=  297 fps=0.0 q=-0.0 Lsize=N/A time=00:00:04.95 bitrate=N/A speed=12.5x    
video:155kB audio:424kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

for all other parts it's inconsistent and should be 300

frame=  297 fps=0.0 q=-0.0 Lsize=N/A time=00:00:04.95 bitrate=N/A speed=12.3x    
video:155kB audio:423kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

The issue is exactly the same with any other interval size, e.g with 10 seconds I would get the following video size:

ffprobe 597 - 600 ...
ffmpeg 597 597 ...

I thought it could be related to source vfr or cfr but I tried to convert the input to cfr and nothing changed.

Moreover, I tried to force the keyframe every second to check if it was a keyframe issue with the following arg: -force_key_frames "expr:gte(t,n_forced*1)", but the problem is exactly the same.

What am I doing wrong? it happens to me a lot with files in hvc1 and I really don't know how to deal with that.


Solution

  • The source of the differences is that FFprobe counts the discarded packets, and FFmpeg doesn't count the discarded packets as frames.


    Your results are consistent with video stream that is created with 3 B-Frames (3 consecutive B-Frames for every P-Frame or I-Frame).

    According to Wikipedia:

    I‑frames are the least compressible but don't require other video frames to decode.
    P‑frames can use data from previous frames to decompress and are more compressible than I‑frames.
    B‑frames can use both previous and forward frames for data reference to get the highest amount of data compression.

    When splitting a video with P-Frame and B-Frame into segments without re-encoding, the dependency chain breaks.

    • There are (almost) always frames that depends upon frames from the previous segment or the next segment.
    • The above frames are kept, but the matching packets are marked as "discarded" (marked with AV_PKT_FLAG_DISCARD flag).

    For the purpose of working on the same dataset, we my build synthetic video (to be used as input).

    Building synthetic video with the following command:

    ffmpeg -y -r 60 -f lavfi -i testsrc=size=384x256:rate=1 -vf "setpts=N/60/TB" -g 60 -vcodec libx265 -x265-params crf=28:bframes=3:b-adapt=0 -tag:v hvc1 -pix_fmt yuv420p -t 20 test_video.mp4
    
    • -g 60 set GOP size to 60 frames (insert a key frame every 60 frames).
    • bframes=3:b-adapt=0 force 3 consecutive B-Frames.

    For verifying the number of I/P/B frames, we may use FFprobe:

    ffprobe -i test_video.mp4 -show_frames -show_entries frame=pict_type
    

    The output is like:

    pict_type=I
    pict_type=B
    pict_type=B
    pict_type=B
    pict_type=P
    pict_type=B
    pict_type=B
    pict_type=B
    ...


    Segment the video by time (5 seconds per segment):

    ffmpeg -i test_video.mp4 -f segment -vcodec copy -reset_timestamps 1 -segment_time 5 clips/part_%d.mp4
    

    FFprobe counting:
    297 1497 1200
    300 1797 1200
    300 2097 1200
    303 2400 1200

    FFmpeg counting:
    frame= 297
    frame= 297
    frame= 297
    frame= 300

    As you can see, the result is consistent with your output.


    We may identify the "discarded" packets using FFprobe:

    ffprobe -i part_1.mp4 -show_packets
    

    Look for flags=_D.
    Packet with flags=_D is marked as "discarded"
    Note: In a video stream every packet matches a frame.

    FFprobe output begins with:
    flags=K_
    flags=_D
    flags=_D
    flags=_D
    flags=__
    flags=__
    flags=__
    ...

    For every middle segment, 3 packets are marked as "discarded", and that is the reason for the 3 missing frames in FFmpeg compared to FFprobe.