windows video-capture directshow hardware-acceleration video-recording

Capturing and displaying live camera content on Windows

I am developing a Windows application that is able to display a high-quality video feed, record it or take photos from it, and edit them later (up to 4K, in the near future maybe 8K). I currently have a working product, using WPF (C#). For capturing and displaying video, I used the AForge.NET library.

My problem is that the application is really slow, with the main performance hit coming from video rendering. Apparently the only way to do this, is to have a callback from the AForge library, providing a new frame every time one is available. That frame is then placed as an image inside an Image element. I believe you can see where the performance hit comes from, especially for high-res imagery.

My experience with WPF and these enormous libraries has made me rethink how I want to program in general; I do not want to make bad software which takes up everyone's time by being slow (I refer to the Handmade network for more on "why?".

The problem is, camera capture and display was hell in WPF C#, but I do not seem to be better of anywhere else (on Windows, that is). An option would be for me to use mostly C++ and DirectShow. This is an okay-ish solution, but feels outdated in terms of performance, and is built upon Microsoft's COM system, which I prefer to avoid. There are options to render with hardware using Direct3D, but DirectShow and Direct3D do not play nicely together.

I have researched how other applications were able to achieve this. VLC uses DirectShow, but this only shows that DirectShow suffers from large latency. I assume this is because VLC was not intended for real-time purposes. OBS studio uses whatever QT uses, but I was unable to find how they do it. OpenCV grabs frames and blits them to the screen, not efficient at all, but that suffices for the OpenCV audience. Lastly, the integrated webcam app from Windows. For some reason this app is able to record and play back in real time, without a large performance hit. I was not able to figure out how they did this, nor did I find any other solution achieving comparable results to that tool.

TLDR; So my questions are: How would I go about efficiently capturing and rendering a camera stream, preferably hardware accelerated; Is it possible to do this on Windows without going through Directshow; And lastly, do I ask to much of commodity devices when I want them to process 4K footage in real-time?

I have not found anyone doing this in a way that suffices my needs; this makes me feel both desperate and guilty at the same time. I would have preferred to not bother StackOverflow with this problem.

Many thanks in advance, for an answer, or advice on this topic in general.

Solution

Your question is about combination of several technologies: video capture, video presentation and what it takes to connect the two together.

On Windows there are two video related APIs (if we don't take ancient VfW into consideration): DirectShow and Media Foundation. Both APIs have underlying layers, which are mostly shared and for this reason both DirectShow and Media Foundation offer similar video capture capabilities and performance. Both APIs offer you good video capture latency, reasonably low. As things stand now use of DirectShow is not recommended since the API is at its end of life, and is mostly abandoned technology. Same time, you would probably find DirectShow better documented, more versatile and provided with orders of magnitude better supplementary materials and third party software items. You mentioned a few libraries and they all are built on top of one of the mentioned technologies (VfW, DirectShow, Media Foundation) with implementation quality inferior to original operating system API.

Practically, you capture video with either of the two, preferably Media Foundation as the current technology.

In my opinion the most important part of your question is how to organize video rendering. Performance wise it is essential to take advantage of hardware acceleration and in this context it is important what technologies your application is built on and what are the available integration options for video presentation/embedding. For a .NET desktop application you would be interested in either mixing Direct3D 11/12 with .NET or using MediaPlayerElement control and a research how to inject video frames into it. As mentioned above even though third party libraries are available, you should not expect them to solve problems in an appropriate way. You are interested to at least understand the data flow in the video pipeline.

Then you have a problem how to connect video capture (not accelerated by video hardware) and hardware accelerated video rendering. There can be multiple solutions here, but important is that DirectShow's support for hardware acceleration is limited and stopped its evolution at Direct3D 9, which sounds as outdated nowadays. This is one another reason to say farewell to this - no doubt - excellent piece of technology. You are interested in investigating your options in placing captured video content into Direct3D 11/Direct3D 12/Direct2D as soon as possible and utilize standard current technologies for the following processing. Actual technologies might depend: it can be Media Foundation, Direct3D 11/12 or mentioned MediaPlayerElement controls, as well as a few other options like Direct2D which are also good. On the way to great or at least reasonable performance you are interested to minimize use of third party - even though popular - libraries, even if they have buzz words in their titles.

4K realtime footage can be captured and processed in real time, however you normally either have professional video capture hardware or the content is compressed which you are supposed to decompress with hardware acceleration.