FFMPEG RTSP stream to MPEG4/H264 file using libx264

Heyo folks,

I'm attempting to transcode/remux an RTSP stream in H264 format into a MPEG4 container, containing just the H264 video stream. Basically, webcam output into a MP4 container.

I can get a poorly coded MP4 produced, using this code:

// Variables here for demo
AVFormatContext * video_file_output_format = nullptr;
AVFormatContext * rtsp_format_context = nullptr;
AVCodecContext * video_file_codec_context = nullptr;
AVCodecContext * rtsp_vidstream_codec_context = nullptr;
AVPacket packet = {0};
AVStream * video_file_stream = nullptr;
AVCodec * rtsp_decoder_codec = nullptr;
int errorNum = 0, video_stream_index = 0;
std::string outputMP4file = "D:\\somemp4file.mp4";

// begin
AVDictionary * opts = nullptr;
av_dict_set(&opts, "rtsp_transport", "tcp", 0);

if ((errorNum = avformat_open_input(&rtsp_format_context, uriANSI.c_str(), NULL, &opts)) < 0) {
    errOut << "Connection failed: avformat_open_input failed with error " << errorNum << ":\r\n" << ErrorRead(errorNum);
    TacticalAbort();
    return;
}

rtsp_format_context->max_analyze_duration = 50000;
if ((errorNum = avformat_find_stream_info(rtsp_format_context, NULL)) < 0) {
    errOut << "Connection failed: avformat_find_stream_info failed with error " << errorNum << ":\r\n" << ErrorRead(errorNum);
    TacticalAbort();
    return;
}

video_stream_index = errorNum = av_find_best_stream(rtsp_format_context, AVMEDIA_TYPE_VIDEO, -1, -1, NULL, 0);

if (video_stream_index < 0) {
    errOut << "Connection in unexpected state; made a connection, but there was no video stream.\r\n"
        "Attempts to find a video stream resulted in error " << errorNum << ": " << ErrorRead(errorNum);
    TacticalAbort();
    return;
}

rtsp_vidstream_codec_context = rtsp_format_context->streams[video_stream_index]->codec;

av_init_packet(&packet);

if (!(video_file_output_format = av_guess_format(NULL, outputMP4file.c_str(),  NULL))) {
    TacticalAbort();
    throw std::exception("av_guess_format");
}

if (!(rtsp_decoder_codec = avcodec_find_decoder(rtsp_vidstream_codec_context->codec_id))) {
    errOut << "Connection failed: connected, but avcodec_find_decoder returned null.\r\n"
        "Couldn't find codec with an AV_CODEC_ID value of " << rtsp_vidstream_codec_context->codec_id << ".";
    TacticalAbort();
    return;
}

video_file_format_context = avformat_alloc_context();
video_file_format_context->oformat = video_file_output_format;

if (strcpy_s(video_file_format_context->filename, sizeof(video_file_format_context->filename), outputMP4file.c_str())) {
    errOut << "Couldn't open video file: strcpy_s failed with error " << errno << ".";
    std::string log = errOut.str();
    TacticalAbort();
    throw std::exception("strcpy_s");
}

if (!(video_file_encoder_codec = avcodec_find_encoder(video_file_output_format->video_codec))) {
    TacticalAbort();
    throw std::exception("avcodec_find_encoder");
}

// MARKER ONE

if (!outputMP4file.empty() &&
    !(video_file_output_format->flags & AVFMT_NOFILE) &&
    (errorNum = avio_open2(&video_file_format_context->pb, outputMP4file.c_str(), AVIO_FLAG_WRITE, nullptr, &opts)) < 0) {
    errOut << "Couldn't open video file \"" << outputMP4file << "\" for writing : avio_open2 failed with error " << errorNum << ": " << ErrorRead(errorNum);
    TacticalAbort();
    return;
}

// Create stream in MP4 file
if (!(video_file_stream = avformat_new_stream(video_file_format_context, video_file_encoder_codec))) {
    TacticalAbort();
    return;
}

AVCodecContext * video_file_codec_context = video_file_stream->codec;

// MARKER TWO

// error -22/-21 in avio_open2 if this is skipped
if ((errorNum = avcodec_copy_context(video_file_codec_context, rtsp_vidstream_codec_context)) != 0) {
    TacticalAbort();
    throw std::exception("avcodec_copy_context");
}

//video_file_codec_context->codec_tag = 0;

/*
// MARKER 3 - is this not needed? Examples suggest not.
if ((errorNum = avcodec_open2(video_file_codec_context, video_file_encoder_codec, &opts)) < 0)
{
    errOut << "Couldn't open video file codec context: avcodec_open2 failed with error " << errorNum << ": " << ErrorRead(errorNum);
    std::string log = errOut.str();
    TacticalAbort();
    throw std::exception("avcodec_open2, video file");
}*/

//video_file_format_context->flags |= AVFMT_FLAG_GENPTS;
if (video_file_format_context->oformat->flags & AVFMT_GLOBALHEADER)
{
    video_file_codec_context->flags |= CODEC_FLAG_GLOBAL_HEADER;
}

if ((errorNum = avformat_write_header(video_file_format_context, &opts)) < 0) {
    errOut << "Couldn't open video file: avformat_write_header failed with error " << errorNum << ":\r\n" << ErrorRead(errorNum);
    std::string log = errOut.str();
    TacticalAbort();
    return;
}

However, there are several issues:

I can't pass any x264 options to the output file. The output H264 matches the input H264's profile/level - switching cameras to a different model switches H264 level.
The timing of the output file is off, noticeably.
The duration of the output file is off, massively. A few seconds of footage becomes hours, although playtime doesn't match. (FWIW, I'm using VLC to play them.)

Passing x264 options

If I manually increment PTS per packet, and set DTS equal to PTS, it plays too fast, ~2-3 seconds' worth of footage in one second playtime, and duration is hours long. The footage also blurs past several seconds, about 10 seconds' footage in a second.

If I let FFMPEG decide (with or without GENPTS flag), the file has a variable frame rate (probably as expected), but it plays the whole file in an instant and has a long duration too (over forty hours for a few seconds). The duration isn't "real", as the file plays in an instant.

At Marker One, I try to set the profile by passing options to avio_open2. The options are simply ignored by libx264. I've tried:

av_dict_set(&opts, "vprofile", "main", 0);
av_dict_set(&opts, "profile", "main", 0); // error, missing '('
// FF_PROFILE_H264_MAIN equals 77, so I also tried
av_dict_set(&opts, "vprofile", "77", 0); 
av_dict_set(&opts, "profile", "77", 0);

It does seem to read the profile setting, but it doesn't use them. At Marker Two, I tried to set it after the avio_open2, before avformat_write_header .

// I tried all 4 av_dict_set from earlier, passing it to avformat_write_header.
// None had any effect, they weren't consumed.
av_opt_set(video_file_codec_context, "profile", "77", 0);
av_opt_set(video_file_codec_context, "profile", "main", 0);
video_file_codec_context->profile = FF_PROFILE_H264_MAIN;
av_opt_set(video_file_codec_context->priv_data, "profile", "77", 0);
av_opt_set(video_file_codec_context->priv_data, "profile", "main", 0);

Messing with privdata made the program unstable, but I was trying anything at that point. I'd like to solve issue 1 with passing settings, since I imagine it'd bottleneck any attempt to solve issues 2 or 3.

I've been fiddling with this for the better part of a month now. I've been through dozens of documentation, Q&As, examples. It doesn't help that quite a few are outdated.

Any help would be appreciated.

Cheers

Solution

Okay, firstly, I wasn't using ffmpeg, but a fork of ffmpeg called libav. Not to be confused, ffmpeg is more recent, and libav was used in some distributions of Linux.

Compiling for Visual Studio

Once I upgraded to the main branch, I had to compile it manually again, since I was using it in Visual Studio and the only static libraries are G++, so linking doesn't work nicely.

The official guide is https://trac.ffmpeg.org/wiki/CompilationGuide/MSVC.
First, compiling as such works fine:
Ensure VS is in PATH. Your PATH should read in this order:

C:\Program Files (x86)\Microsoft Visual Studio XX.0\VC\bin
D:\MinGW\msys64\mingw32\bin  
D:\MinGW\msys64\usr\bin  
D:\MinGW\bin

Then run Visual Studio x86 Native Tools prompt. Should be in your Start Menu.
In the CMD, run
(your path to MinGW)\msys64\msys2_shell.cmd -full-path
In the created MinGW window, run:
$ cd /your dev path/
$ git clone https://git.ffmpeg.org/ffmpeg.git ffmpeg
After about five minutes you'll have the FFMPEG source in the subfolder ffmpeg.
Access the source via:
$ cd ffmpeg
Then run:
$ which link
if it doesn't provide the VS path from PATH, but usr/link or usr/bin/link, rename similarly:
$ mv /usr/bin/link.exe /usr/bin/msys-link.exe
If it does skip the $ mv step.

Finally, run this command:
$ ./configure --toolchain=msvc and whatever other cmdlines you want
(you can see commandlines via ./configure --help)
It may appear inactive for a long time. Once it's done you'll get a couple pages of output.

Then run:
$ make
$ make install

Note for static builds (configure with --enable-static), although you get Windows static lib files, it'll produce them with extension *.a files. Just rename to .lib.
(you can use cmd: ren *.a *.lib)

Using FFMPEG

To just copy from FFMPEG RTSP to a file, using source profile, level etc, just use:

read network frame av_read_frame(rtsp_format_context)
pass to MP4 av_write_frame(video_file_format_context)

You don't need to open a AVCodecContext, a decoder or encoder; just avformat_open_input, and the video file AVFormatContext and AVIOContext.

If you want to re-encode, you have to:

read network frame
av_read_frame(rtsp_format_context)
pass packet to decoder
avcodec_send_packet(rtsp_decoder_context)
read frames from decoder (in loop)
avcodec_receive_frame(rtsp_decoder_context)
send each decoded frame to encoder
avcodec_send_frame(video_file_encoder_context)
read packets from encoder (in loop)
avcodec_receive_packet(video_file_encoder_context)
send each encoded packet to output video av_write_frame(video_file_format_context)

Some gotchas

Copy out the width, height, and pixel format manually. For H264 it's YUV420P.
As example, for level 3.1, profile high:

  AVCodecParameters * video_file_codec_params = video_file_stream->codecpar;
  video_file_codec_params->profile = FF_PROFILE_H264_HIGH;
  video_file_codec_params->format = AV_PIX_FMT_YUV420P;
  video_file_codec_params->level = 31;
  video_file_codec_params->width = rtsp_vidstream->codecpar->width;
  video_file_codec_params->height = rtsp_vidstream->codecpar->height;

libx264 accepts a H264 preset via the opts parameter in avcodec_open2. Example for "veryfast" preset:

  AVDictionary * mydict;
  av_dict_set(&mydict, "preset", "veryfast", 0);
  avcodec_open2(video_file_encoder_context, video_file_encoder_codec, &opts)
  // avcodec_open2 returns < 0 for errors.
  // Recognised options will be removed from the mydict variable.
  // If all are recognised, mydict will be NULL.

Output timing is a volatile thing. Use this before ``.

  video_file_stream->avg_frame_rate = rtsp_vidstream->avg_frame_rate;
  video_file_stream->r_frame_rate = rtsp_vidstream->r_frame_rate;
  video_file_stream->time_base = rtsp_vidstream->time_base;
  video_file_encoder_context->time_base = rtsp_vidstream_codec_context->time_base;
  // Decreasing GOP size for more seek positions doesn't end well.
  // libx264 forces the new GOP size.
  video_file_encoder_context->gop_size = rtsp_vidstream_codec_context->gop_size;
  if ((errorNum = avcodec_open2(video_file_encoder_context,...)) < 0) {
      // an error...
  }

H264 may write at double-speed to the file, so playback is doubly fast. To change this, go manual with encoded packets' timing:
```
  packet->pts = packet->dts = frameNum++;
  av_packet_rescale_ts(packet, video_file_encoder_context->time_base, video_file_stream->time_base);
  packet->pts *= 2;
  packet->dts *= 2;
  av_interleaved_write_frame(video_file_format_context, packet)
  // av_interleaved_write_frame returns < 0 for errors.
```
Note we switch av_write_frame to av_interleaved_write_frame, and set both PTS and DTS. frameNum should be a int64_t, and should start from 0 (although that's not required).

Also note the av_rescale_ts call's parameters are the video file encoder context, and the video file stream - RTSP isn't involved.
VLC media player won't play H.264 streams encoded with an FPS of 4 or lower. So if your RTSP streams are showing the first decoded frame and never progressing, or showing pure green until the video ends, make sure your FPS is high enough. (that's VLC v2.2.4)