Search code examples
c++videosimulatorms-media-foundation

How to use Media Foundation when application is video source and streams video via network?


I am about to write a Simulator of sorts in C++. It is a console application which...simulates stuff and has a REST-ful socket API to stuff which impacts/contributes to the simulation. In order to see what is going on in the simulation, I had the maybe brilliant idea to generate series of images (bitmaps) which reflect the simulator state. And then to simply stream this as a movie so I can watch my simulator simulating from a media player, such as VLC or Windows media player. Or to save the video to a file (second option). Or both.

Being on Windows 7 (x64), Windows Media Foundation looks like the technology to use for that. Only I do not feel about becoming a Media foundation expert and would rather focus on my simulator.

Here is how it is supposed to work:

  • Simulator code has state, which can be rendered to a bitmap (width, height, ...). The simulator is not running in real-time (is not clocked). It does its calculations and depending on what happens to the simulators input, the the state changes in non-periodic ways.
  • Whenever the state changes, render a bitmap and queue it up for video output. Since the video is (depending on video stream encoding) expecting some kind of constant frame rate, the last image shall be simply re-transmitted until there is a new frame, thus creating the video. But they seem to treat the frame-duration rather like a constant.
  • Media foundation does not seem to contain ready-made code to realize a (UDP or TCP based RTP/RSTP) interface to my application. So I hope it can do the conversion from a bitmap to a jpeg frame for me and provide me with some form of media clock.
    • I think I figured out how those RTP etc. headers work and can easily program the network side myself. If that approach is the fastest one.

What I find difficult right now is to figure out how to best orchestrate and select the parts of Media foundation which would do the encoding.

I think I have to implement a custom IMFMediaSource COM object and use that as the start of the pipeline. But I did not find out how to get the encoded jpeg images or some IMFMediaSink video stream data back from the pipeline so I can feed my network layer with it.

So here my set of questions I need help with from you Media foundation experts:

  • Is there a shortcut for me to do what I have in mind? I am not fixated on RTP/RTSP - any form of video format will do. As long as the amount of work I have to invest is minimal and I can use a standard player.
  • Can someone create a short list for me what I would have to do? It can be pretty bird-view. Like "Create COM object which implements interfaces [list] - Use Mediafoundation codecs and whatnot [names] to build the pipeline - Implement IMFMediaSink in your sink object and get the data like...this ... The media clock is provided by (do I have to clock the pipeline from the back as it is a pull model or is the IMFMediaSink driven by some clock from within the session?) etc."?
  • If I were to use a h.264 encoder - can I get the data back or is it only rendering to file? Would I need extra protocol headers to stream that via a network socket?

As I have experience in COM programming and other basics, all I really need is a shortcut so I do not have to study and trial-and-error with Media Foundation to find out how to realize this use case or at least have information helping me to find a good starting point for my efforts.

Thanks in advance!

Here a draft of how the application could access this:

#include <cstdint>

struct SourceFormatDescriptor
{
    uint32_t width;
    uint32_t height;
    uint32_t colorDepth; // Bytes per pixel -> RGB (3 bytes per pixel) | RGBA (4 bytes per pixel)
};

struct IVideoStreamReceiver
{
    virtual void Starting() = 0;
    virtual void Ending() = 0;
    virtual uint32_t MaxChunkSize() = 0; // Somehow network protocol MTU related
    // Can/should be directly forwarded to network layer and/or saved to file.
    virtual void VideoChunk(size_t chunkSize, const uint8_t* chunk) = 0;
};

// Application object behind which all video rendering voodoo will be implemented.
class VideoRenderer
{
    // Is initialized once and fixed while the streaming is done.
    SourceFormatDescriptor m_sourceDescriptor;

public:
    VideoRenderer(const SourceFormatDescriptor& sourceDescriptor);

    void Start();
    void Stop();

    // Not sure who drives the renderer (Media foundation or external clock?)
    void MediaClockTick(); // dummy reminder function

    void AddReceiver(IVideoStreamReceiver* receiver);

    // Whenever a new frame is rendered it is passed to the video renderer.
    void NewFrame(const uint8_t * frameData, size_t frameSize);
};

Solution

  • I've been using and trying to get a handle on the Windows Media Foundation for a while now and I'm at the other end of the spectrum from being an expert. From what I have experienced of MF I would recommend finding a different approach for your bitmap to video streaming logic. While MF can do the bitmap encoding to H264 (which you DO want to do if you are going across a network) it only provides very minimal integration with streaming network protocols. As far as I can tell this is limited to allowing basic RTSP servers to be used with the SourceReader but NO functionality for networking as the output for a SinkWriter, which is what you need. Consequently you'd have to roll your own along the lines you've mentioned but an RTP/RTSP stack is going to be a bigger job than you think.

    One option that may suit you is ffmpeg. Either by directly using one of the pre-built exes or by using the libraries included in the project. Take a look at this SO question. An external library solution such as ffmpeg may not be as clean as using MF but in your case it will take you a lot lot less effort. ffmpeg does have much better support for network streaming including RTP and RTSP.