I am about to write a Simulator of sorts in C++. It is a console application which...simulates stuff and has a REST-ful socket API to stuff which impacts/contributes to the simulation. In order to see what is going on in the simulation, I had the maybe brilliant idea to generate series of images (bitmaps) which reflect the simulator state. And then to simply stream this as a movie so I can watch my simulator simulating from a media player, such as VLC or Windows media player. Or to save the video to a file (second option). Or both.
Being on Windows 7 (x64), Windows Media Foundation looks like the technology to use for that. Only I do not feel about becoming a Media foundation expert and would rather focus on my simulator.
Here is how it is supposed to work:
What I find difficult right now is to figure out how to best orchestrate and select the parts of Media foundation which would do the encoding.
I think I have to implement a custom IMFMediaSource
COM object and use that as the start of the pipeline. But I did not find out how to get the encoded jpeg images or some IMFMediaSink
video stream data back from the pipeline so I can feed my network layer with it.
So here my set of questions I need help with from you Media foundation experts:
As I have experience in COM programming and other basics, all I really need is a shortcut so I do not have to study and trial-and-error with Media Foundation to find out how to realize this use case or at least have information helping me to find a good starting point for my efforts.
Thanks in advance!
Here a draft of how the application could access this:
#include <cstdint>
struct SourceFormatDescriptor
{
uint32_t width;
uint32_t height;
uint32_t colorDepth; // Bytes per pixel -> RGB (3 bytes per pixel) | RGBA (4 bytes per pixel)
};
struct IVideoStreamReceiver
{
virtual void Starting() = 0;
virtual void Ending() = 0;
virtual uint32_t MaxChunkSize() = 0; // Somehow network protocol MTU related
// Can/should be directly forwarded to network layer and/or saved to file.
virtual void VideoChunk(size_t chunkSize, const uint8_t* chunk) = 0;
};
// Application object behind which all video rendering voodoo will be implemented.
class VideoRenderer
{
// Is initialized once and fixed while the streaming is done.
SourceFormatDescriptor m_sourceDescriptor;
public:
VideoRenderer(const SourceFormatDescriptor& sourceDescriptor);
void Start();
void Stop();
// Not sure who drives the renderer (Media foundation or external clock?)
void MediaClockTick(); // dummy reminder function
void AddReceiver(IVideoStreamReceiver* receiver);
// Whenever a new frame is rendered it is passed to the video renderer.
void NewFrame(const uint8_t * frameData, size_t frameSize);
};
I've been using and trying to get a handle on the Windows Media Foundation for a while now and I'm at the other end of the spectrum from being an expert. From what I have experienced of MF I would recommend finding a different approach for your bitmap to video streaming logic. While MF can do the bitmap encoding to H264 (which you DO want to do if you are going across a network) it only provides very minimal integration with streaming network protocols. As far as I can tell this is limited to allowing basic RTSP servers to be used with the SourceReader but NO functionality for networking as the output for a SinkWriter, which is what you need. Consequently you'd have to roll your own along the lines you've mentioned but an RTP/RTSP stack is going to be a bigger job than you think.
One option that may suit you is ffmpeg. Either by directly using one of the pre-built exes or by using the libraries included in the project. Take a look at this SO question. An external library solution such as ffmpeg may not be as clean as using MF but in your case it will take you a lot lot less effort. ffmpeg does have much better support for network streaming including RTP and RTSP.