Search code examples
ffmpegmp4h.264libavlibavformat

Libavformat/FFMPEG: Muxing into mp4 with AVFormatContext drops the final frame, depending on the number of frames


I am trying to use libavformat to create a .mp4 video with a single h.264 video stream, but the final frame in the resulting file often has a duration of zero and is effectively dropped from the video. Strangely enough, whether the final frame is dropped or not depends on how many frames I try to add to the file. Some simple testing that I outline below makes me think that I am somehow misconfiguring either the AVFormatContext or the h.264 encoder, resulting in two edit lists that sometimes chop off the final frame. I will also post a simplified version of the code I am using, in case I'm making some obvious mistake. Any help would be greatly appreciated: I've been struggling with this issue for the past few days and have made little progress.

I can recover the dropped frame by creating a new mp4 container using ffmpeg binary with the copy codec if I use the -ignore_editlist option. Inspecting the file with a missing frame using ffprobe, mp4trackdump, or mp4file --dump, shows that the final frame is dropped if its sample time is exactly the same the end of the edit list. When I make a file that has no dropped frames, it still has two edit lists: the only difference is that the end time of the edit list is beyond all samples in files that do not have dropped frames. Though this is hardly a fair comparison, if I make a .png for each frame and then generate a .mp4 with ffmpeg using the image2 codec and similar h.264 settings, I produce a movie with all frames present, only one edit list, and similar PTS times as my mangled movies with two edit lists. In this case, the edit list always ends after the last frame/sample time.

I am using this command to determine the number of frames in the resulting stream, though I also get the same number with other utilities:

ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of default=nokey=1:noprint_wrappers=1 video_file_name.mp4

Simple inspection of the file with ffprobe shows no obviously alarming signs to me, besides the framerate being affected by the missing frame (the target was 24):

$ ffprobe -hide_banner testing.mp4
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'testing.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
  Duration: 00:00:04.13, start: 0.041016, bitrate: 724 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 100x100, 722 kb/s, 24.24 fps, 24 tbr, 12288 tbn, 48 tbc (default)
    Metadata:
      handler_name    : VideoHandler

The files that I generate programatically always have two edit lists, one of which is very short. In files both with and without a missing frame, the duration one of the frames is 0, while all the others have the same duration (512). You can see this in the ffmpeg output for this file that I tried to put 100 frames into, though only 99 are visible despite the file containing all 100 samples.

$ ffmpeg -hide_banner -y -v 9 -loglevel 99 -i testing.mp4  
...
<edited to remove the class printing>
type:'edts' parent:'trak' sz: 48 100 948
type:'elst' parent:'edts' sz: 40 8 40
track[0].edit_count = 2
duration=41 time=-1 rate=1.000000
duration=4125 time=0 rate=1.000000
type:'mdia' parent:'trak' sz: 808 148 948
type:'mdhd' parent:'mdia' sz: 32 8 800
type:'hdlr' parent:'mdia' sz: 45 40 800
ctype=[0][0][0][0]
stype=vide
type:'minf' parent:'mdia' sz: 723 85 800
type:'vmhd' parent:'minf' sz: 20 8 715
type:'dinf' parent:'minf' sz: 36 28 715
type:'dref' parent:'dinf' sz: 28 8 28
Unknown dref type 0x206c7275 size 12
type:'stbl' parent:'minf' sz: 659 64 715
type:'stsd' parent:'stbl' sz: 151 8 651
size=135 4CC=avc1 codec_type=0
type:'avcC' parent:'stsd' sz: 49 8 49
type:'stts' parent:'stbl' sz: 32 159 651
track[0].stts.entries = 2
sample_count=99, sample_duration=512
sample_count=1, sample_duration=0
...
AVIndex stream 0, sample 99, offset 5a0ed, dts 50688, size 3707, distance 0, keyframe 1
Processing st: 0, edit list 0 - media time: -1, duration: 504
Processing st: 0, edit list 1 - media time: 0, duration: 50688
type:'udta' parent:'moov' sz: 98 1072 1162
...

The last frame has zero duration:

$ mp4trackdump -v testing.mp4
...
mp4file testing.mp4, track 1, samples 100, timescale 12288
sampleId      1, size  6943 duration      512 time        0 00:00:00.000 S
sampleId      2, size  3671 duration      512 time      512 00:00:00.041 S
...
sampleId     99, size  3687 duration      512 time    50176 00:00:04.083 S
sampleId    100, size  3707 duration        0 time    50688 00:00:04.125 S

Non-mangled videos that I generate have similar structure, as you can see in this video that had 99 input frames, all of which are visible in the output. Even though the sample_duration is set to zero for one of the samples in the stss box, it is not dropped from the frame count or when reading the frames back in with ffmpeg.

$ ffmpeg -hide_banner -y -v 9 -loglevel 99 -i testing_99.mp4  
...
type:'elst' parent:'edts' sz: 40 8 40
track[0].edit_count = 2
duration=41 time=-1 rate=1.000000
duration=4084 time=0 rate=1.000000
...
track[0].stts.entries = 2
sample_count=98, sample_duration=512
sample_count=1, sample_duration=0
...
AVIndex stream 0, sample 98, offset 5d599, dts 50176, size 3833, distance 0, keyframe 1
Processing st: 0, edit list 0 - media time: -1, duration: 504
Processing st: 0, edit list 1 - media time: 0, duration: 50184
...
$ mp4trackdump -v testing_99.mp4
...
sampleId     98, size  3814 duration      512 time    49664 00:00:04.041 S
sampleId     99, size  3833 duration        0 time    50176 00:00:04.083 S

One difference that jumps out to me is that the mangled file's second edit list ends at time 50688, which coincides with the last sample, while the non-mangled file's edit list ends at 50184, which is after the time of the last sample at 50176. As I mentioned before, whether the last frame is clipped depends on the number of frames I encode and mux into the container: 100 input frames results in 1 dropped frame, 99 results in 0, 98 in 0, 97 in 1, etc...

Here is the code that I used to generate these files, which is a MWE script version of library functions that I am modifying. It is written in Julia, which I do not think is important here, and calls the FFMPEG library version 4.3.1. It's more or less a direct translation from of the FFMPEG muxing demo, although the codec context here is created before the format context. I am presenting the code that interacts with ffmpeg first, although it relies on some helper code that I will put below.

The helper code just makes it easier to work with nested C structs in Julia, and allows . syntax in Julia to be used in place of C's arrow (->) operator for field access of struct pointers. Libav structs such as AVFrame appear as a thin wrapper type AVFramePtr, and similarly AVStream appears as AVStreamPtr etc... These act like single or double pointers for the purposes of function calls, depending on the function's type signature. Hopefully it will be clear enough to understand if you are familiar with working with libav in C, and I don't think looking at the helper code should be necessary if you don't want to run the code.

# Function to transfer array to AVPicture/AVFrame
function transfer_img_buf_to_frame!(frame, img)
    img_pointer = pointer(img)
    data_pointer = frame.data[1] # Base-1 indexing, get pointer to first data buffer in frame
    for h = 1:frame.height
        data_line_pointer = data_pointer + (h-1) * frame.linesize[1] # base-1 indexing
        img_line_pointer = img_pointer + (h-1) * frame.width
        unsafe_copyto!(data_line_pointer, img_line_pointer, frame.width) # base-1 indexing
    end
end

# Function to transfer AVFrame to AVCodecContext, and AVPacket to AVFormatContext
function encode_mux!(packet, format_context, frame, codec_context; flush = false)
    if flush
        fret = avcodec_send_frame(codec_context, C_NULL)
    else
        fret = avcodec_send_frame(codec_context, frame)
    end
    if fret < 0 && !in(fret, [-Libc.EAGAIN, VIO_AVERROR_EOF])
        error("Error $fret sending a frame for encoding")
    end

    pret = Cint(0)
    while pret >= 0
        pret = avcodec_receive_packet(codec_context, packet)
        if pret == -Libc.EAGAIN || pret == VIO_AVERROR_EOF
             break
        elseif pret < 0
            error("Error $pret during encoding")
        end
        stream = format_context.streams[1] # Base-1 indexing
        av_packet_rescale_ts(packet, codec_context.time_base, stream.time_base)
        packet.stream_index = 0
        ret = av_interleaved_write_frame(format_context, packet)
        ret < 0 && error("Error muxing packet: $ret")
    end
    if !flush && fret == -Libc.EAGAIN && pret != VIO_AVERROR_EOF
        fret = avcodec_send_frame(codec_context, frame)
        if fret < 0 && fret != VIO_AVERROR_EOF
            error("Error $fret sending a frame for encoding")
        end
    end
    return pret
end

# Set parameters of test movie
nframe = 100
width, height = 100, 100
framerate = 24
gop = 0
codec_name = "libx264"
filename = "testing.mp4"

((width % 2 !=0) || (height % 2 !=0)) && error("Encoding error: Image dims must be a multiple of two")

# Make test images
imgstack = map(x->rand(UInt8,width,height),1:nframe);

pix_fmt = AV_PIX_FMT_GRAY8
framerate_rat = Rational(framerate)

codec = avcodec_find_encoder_by_name(codec_name)
codec == C_NULL && error("Codec '$codec_name' not found")

# Allocate AVCodecContext
codec_context_p = avcodec_alloc_context3(codec) # raw pointer
codec_context_p == C_NULL && error("Could not allocate AVCodecContext")
# Easier to work with pointer that acts like a c struct pointer, type defined below
codec_context = AVCodecContextPtr(codec_context_p)

codec_context.width = width
codec_context.height = height
codec_context.time_base = AVRational(1/framerate_rat)
codec_context.framerate = AVRational(framerate_rat)
codec_context.pix_fmt = pix_fmt
codec_context.gop_size = gop

ret = avcodec_open2(codec_context, codec, C_NULL)
ret < 0 && error("Could not open codec: Return code $(ret)")

# Allocate AVFrame and wrap it in a Julia convenience type
frame_p = av_frame_alloc()
frame_p == C_NULL && error("Could not allocate AVFrame")
frame = AVFramePtr(frame_p)

frame.format = pix_fmt
frame.width = width
frame.height = height

# Allocate picture buffers for frame
ret = av_frame_get_buffer(frame, 0)
ret < 0 && error("Could not allocate the video frame data")

# Allocate AVPacket and wrap it in a Julia convenience type
packet_p = av_packet_alloc()
packet_p == C_NULL && error("Could not allocate AVPacket")
packet = AVPacketPtr(packet_p)

# Allocate AVFormatContext and wrap it in a Julia convenience type
format_context_dp = Ref(Ptr{AVFormatContext}()) # double pointer
ret = avformat_alloc_output_context2(format_context_dp, C_NULL, C_NULL, filename)
if ret != 0 || format_context_dp[] == C_NULL
    error("Could not allocate AVFormatContext")
end
format_context = AVFormatContextPtr(format_context_dp)

# Add video stream to AVFormatContext and configure it to use the encoder made above
stream_p = avformat_new_stream(format_context, C_NULL)
stream_p == C_NULL && error("Could not allocate output stream")
stream = AVStreamPtr(stream_p) # Wrap this pointer in a convenience type

stream.time_base = codec_context.time_base
stream.avg_frame_rate = 1 / convert(Rational, stream.time_base)
ret = avcodec_parameters_from_context(stream.codecpar, codec_context)
ret < 0 && error("Could not set parameters of stream")

# Open the AVIOContext
pb_ptr = field_ptr(format_context, :pb)
# This following is just a call to avio_open, with a bit of extra protection
# so the Julia garbage collector does not destroy format_context during the call
ret = GC.@preserve format_context avio_open(pb_ptr, filename, AVIO_FLAG_WRITE)
ret < 0 && error("Could not open file $filename for writing")

# Write the header
ret = avformat_write_header(format_context, C_NULL)
ret < 0 && error("Could not write header")

# Encode and mux each frame
for i in 1:nframe # iterate from 1 to nframe
    img = imgstack[i] # base-1 indexing
    ret = av_frame_make_writable(frame)
    ret < 0 && error("Could not make frame writable")
    transfer_img_buf_to_frame!(frame, img)
    frame.pts = i
    encode_mux!(packet, format_context, frame, codec_context)
end

# Flush the encoder
encode_mux!(packet, format_context, frame, codec_context; flush = true)

# Write the trailer
av_write_trailer(format_context)

# Close the AVIOContext
pb_ptr = field_ptr(format_context, :pb) # get pointer to format_context.pb
ret = GC.@preserve format_context avio_closep(pb_ptr) # simply a call to avio_closep
ret < 0 && error("Could not free AVIOContext")

# Deallocation
avcodec_free_context(codec_context)
av_frame_free(frame)
av_packet_free(packet)
avformat_free_context(format_context)

Below is the helper code that makes accessing pointers to nested c structs not a total pain in Julia. If you try to run the code yourself, please enter this in before the logic of the code shown above. It requires VideoIO.jl, a Julia wrapper to libav.

# Convenience type and methods to make the above code look more like C
using Base: RefValue, fieldindex

import Base: unsafe_convert, getproperty, setproperty!, getindex, setindex!,
    unsafe_wrap, propertynames

# VideoIO is a Julia wrapper to libav
#
# Bring bindings to libav library functions into namespace
using VideoIO: AVCodecContext, AVFrame, AVPacket, AVFormatContext, AVRational,
    AVStream, AV_PIX_FMT_GRAY8, AVIO_FLAG_WRITE, AVFMT_NOFILE,
    avformat_alloc_output_context2, avformat_free_context, avformat_new_stream,
    av_dump_format, avio_open, avformat_write_header,
    avcodec_parameters_from_context, av_frame_make_writable, avcodec_send_frame,
    avcodec_receive_packet, av_packet_rescale_ts, av_interleaved_write_frame,
    avformat_query_codec, avcodec_find_encoder_by_name, avcodec_alloc_context3,
    avcodec_open2, av_frame_alloc, av_frame_get_buffer, av_packet_alloc,
    avio_closep, av_write_trailer, avcodec_free_context, av_frame_free,
    av_packet_free

# Submodule of VideoIO
using VideoIO: AVCodecs

# Need to import this function from Julia's Base to add more methods
import Base: convert

const VIO_AVERROR_EOF = -541478725 # AVERROR_EOF

# Methods to convert between AVRational and Julia's Rational type, because it's
# hard to access the AV rational macros with Julia's C interface
convert(::Type{Rational{T}}, r::AVRational) where T = Rational{T}(r.num, r.den)
convert(::Type{Rational}, r::AVRational) = Rational(r.num, r.den)
convert(::Type{AVRational}, r::Rational) = AVRational(numerator(r), denominator(r))

"""
    mutable struct NestedCStruct{T}

Wraps a pointer to a C struct, and acts like a double pointer to that memory.
The methods below will automatically convert it to a single pointer if needed
for a function call, and make interacting with it in Julia look (more) similar
to interacting with it in C, except '->' in C is replaced by '.' in Julia.
"""
mutable struct NestedCStruct{T}
    data::RefValue{Ptr{T}}
end
NestedCStruct{T}(a::Ptr) where T = NestedCStruct{T}(Ref(a))
NestedCStruct(a::Ptr{T}) where T = NestedCStruct{T}(a)

const AVCodecContextPtr = NestedCStruct{AVCodecContext}
const AVFramePtr = NestedCStruct{AVFrame}
const AVPacketPtr = NestedCStruct{AVPacket}
const AVFormatContextPtr = NestedCStruct{AVFormatContext}
const AVStreamPtr = NestedCStruct{AVStream}

function field_ptr(::Type{S}, struct_pointer::Ptr{T}, field::Symbol,
                           index::Integer = 1) where {S,T}
    fieldpos = fieldindex(T, field)
    field_pointer = convert(Ptr{S}, struct_pointer) +
        fieldoffset(T, fieldpos) + (index - 1) * sizeof(S)
    return field_pointer
end

field_ptr(a::Ptr{T}, field::Symbol, args...) where T =
    field_ptr(fieldtype(T, field), a, field, args...)

function check_ptr_valid(p::Ptr, err::Bool = true)
    valid = p != C_NULL
    err && !valid && error("Invalid pointer")
    valid
end

unsafe_convert(::Type{Ptr{T}}, ap::NestedCStruct{T}) where T =
    getfield(ap, :data)[]
unsafe_convert(::Type{Ptr{Ptr{T}}}, ap::NestedCStruct{T}) where T =
    unsafe_convert(Ptr{Ptr{T}}, getfield(ap, :data))

function check_ptr_valid(a::NestedCStruct{T}, args...) where T
    p = unsafe_convert(Ptr{T}, a)
    GC.@preserve a check_ptr_valid(p, args...)
end

nested_wrap(x::Ptr{T}) where T = NestedCStruct(x)
nested_wrap(x) = x

function getproperty(ap::NestedCStruct{T}, s::Symbol) where T
    check_ptr_valid(ap)
    p = unsafe_convert(Ptr{T}, ap)
    res = GC.@preserve ap unsafe_load(field_ptr(p, s))
    nested_wrap(res)
end

function setproperty!(ap::NestedCStruct{T}, s::Symbol, x) where T
    check_ptr_valid(ap)
    p = unsafe_convert(Ptr{T}, ap)
    fp = field_ptr(p, s)
    GC.@preserve ap unsafe_store!(fp, x)
end

function getindex(ap::NestedCStruct{T}, i::Integer) where T
    check_ptr_valid(ap)
    p = unsafe_convert(Ptr{T}, ap)
    res = GC.@preserve ap unsafe_load(p, i)
    nested_wrap(res)
end

function setindex!(ap::NestedCStruct{T}, i::Integer, x) where T
    check_ptr_valid(ap)
    p = unsafe_convert(Ptr{T}, ap)
    GC.@preserve ap unsafe_store!(p, x, i)
end

function unsafe_wrap(::Type{T}, ap::NestedCStruct{S}, i) where {S, T}
    check_ptr_valid(ap)
    p = unsafe_convert(Ptr{S}, ap)
    GC.@preserve ap unsafe_wrap(T, p, i)
end

function field_ptr(::Type{S}, a::NestedCStruct{T}, field::Symbol,
                           args...) where {S, T}
    check_ptr_valid(a)
    p = unsafe_convert(Ptr{T}, a)
    GC.@preserve a field_ptr(S, p, field, args...)
end

field_ptr(a::NestedCStruct{T}, field::Symbol, args...) where T =
    field_ptr(fieldtype(T, field), a, field, args...)

propertynames(ap::T) where {S, T<:NestedCStruct{S}} = (fieldnames(S)...,
                                                       fieldnames(T)...)

Edit: Some things that I have already tried

  • Explicitly setting the stream duration to be the same number as the number of frames that I add, or a few more beyond that
  • Explicitly setting the stream start time to zero, while the first frame has a PTS of 1
  • Playing around with encoder parameters, as well as gop_size, using B frames, etc.
  • Setting the private data for the mov/mp4 muxer to set the movflag negative_cts_offsets
  • Changing the framerate
  • Tried different pixel formats, such as AV_PIX_FMT_YUV420P

Also to be clear while I can just transfer the file into another while ignoring the edit lists to work around this problem, I am hoping to not make damaged mp4 files in the first place.


Solution

  • I had a similar issue, where the final frame was missing and this caused the resulting calculated FPS to be different from what I expected.

    It doesn't seem like you are setting AVPacket's duration field. I found out that relying on automatic duration (leaving the field to 0) showed that issue you describe. If you have constant framerate you can calculate how much the duration should be, E.G. set it to 512 for a 12800 time base (= 1/25 of a second) for 25 FPS. Hopefully that helps.