Search code examples
ffmpeglibav

Replace Bento4 with libav / ffmpeg


We use Bento4 - a really well designed SDK - to demux mp4 files in .mov containers. Decoding is done by an own codec, so only the raw (intraframe) samples are needed. By now this works pretty straightforward

AP4_Track *test_videoTrack = nullptr;
AP4_ByteStream *input = nullptr;
AP4_Result result = AP4_FileByteStream::Create(filename, AP4_FileByteStream::STREAM_MODE_READ, input);

AP4_File m_file (*input, true);

//
// Read movie tracks, and metadata, find the video track
size_t index = 0;
uint32_t m_width = 0, m_height = 0;
auto item = m_file.GetMovie()->GetTracks().FirstItem();
auto track = item->GetData();
if (track->GetType() == AP4_Track::TYPE_VIDEO) 
{
    m_width = (uint32_t)((double)test_videoTrack->GetWidth() / double(1 << 16));
    m_height = (uint32_t)((double)test_videoTrack->GetHeight() / double(1 << 16));

    std::string codec("unknown");
    auto sd = track->GetSampleDescription(0);
    AP4_String c;
    if (AP4_SUCCEEDED(sd->GetCodecString(c))) 
    {
        codec = c.GetChars();
    }

    // Find and instantiate the decoder
    AP4_Sample sample;
    AP4_DataBuffer sampleData;
    test_videoTrack->ReadSample(0, sample, sampleData);
}

For several reasons we would prefer replacing Bento4 with libav/ffmpeg (mainly because we already have in the project and want to reduce dependencies)

How would we ( preferrably in pseudo-code ) replace the Bento4-tasks done above with libav? Please remember that the used codec is not in the ffmpeg library, so we cannot use the standard ffmpeg decoding examples. Opening the media file simply fails. Without decoder we got no size or any other info so far. What we need would

  • open the media file
  • get contained tracks (possibly also audio)
  • get track size / length info
  • get track samples by index

Solution

  • It turned out to be very easy:

    AVFormatContext* inputFile = avformat_alloc_context();
    avformat_open_input(&inputFile, filename, nullptr, nullptr);
    avformat_find_stream_info(inputFile, nullptr);
    
    //Get just two streams...First Video & First Audio
    int videoStreamIndex = -1, audioStreamIndex = -1;
    for (int i = 0; i < inputFile->nb_streams; i++)
    {
        if (inputFile->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO && videoStreamIndex == -1)
        {
                videoStreamIndex = i;
        }
        else if (inputFile->streams[i]->codec->codec_type == AVMEDIA_TYPE_AUDIO && audioStreamIndex == -1)
        {
            audioStreamIndex = i;
        }
    }
    

    Now test for the correct codec tag

    // get codec id
    char ct[64] = {0};
    static const char* codec_id = "MPAK";
    av_get_codec_tag_string( ct, sizeof(ct),inputFile->streams[videoStreamIndex]->codec->codec_tag);
    assert(strncmp( ct , codec_id, strlen(codec_id)) == 0)
    

    I did not know that the sizes are set even before a codec is chosen (or even available).

    // lookup size
    Size2D mediasize(inputFile->streams[videoStreamIndex]->codec->width, inputFile->streams[videoStreamIndex]->codec->height);
    

    Seeking by frame and unpacking (video) is done like this:

    AVStream* s = m_file->streams[videoStreamIndex];
    int64_t seek_ts = (int64_t(frame_index) * s->r_frame_rate.den *  s->time_base.den) / (int64_t(s->r_frame_rate.num) * s->time_base.num);
    av_seek_frame(m_hap_file, videoStreamIndex,  seek_ts, AVSEEK_FLAG_ANY);
    
    AVPacket pkt;
    av_read_frame(inputFile, &pkt);
    

    Now the packet contains a frame ready to unpack with own decoder.