Average color of bitmap

I'm looking for an extremely efficient and accurate way to determine the average RGB value of a bitmap. I currently have a method with bitlocks that goes pixel by pixel and takes approximately 25% of my CPU at 30Hz.

I've managed to get it down to ~15% by looking at every third pixel however I am confident that there is a better way. I've also tried moving the calculations to the GPU (Nvidia CUDA) but due to my inexperience in GPU programming it only took longer.

I've thought about things like applying a blur however this doesn't reduce the amount of pixels and thus would not affect the calculations.

I'd like to hear your ideas regarding this interesting topic.

Solution

You can develop a c++ dll doing same calculation with SIMD optimized/vectorized code using intrinsics. Then cpu usage will be much more efficient even at same usage percent. Process the nonaligned header part, then process remaining aligned part using faster instrinsic functions.

If this isn't enough, try moving only half or even quarter of image to GPU since pci-e is bottleneck.

Pipelining also helps to hide some of latency of copying to gpu but uses more CPU but finishes quicker so less total cycles are used.

If a bitmap is already in cpu cache, it should be able to process it concurrently while GPU is processing a "mapped" memory tile(another bitmap or part of same bitmap) without bottlenecking RAM. Don't copy to GPU if data is meant to be streamed. Let GPU map it on its own controller using proper access functions or flags.

The "mapping"'s start point could be bitmap byte array's first multiple of 4096 addressed element.

If you have an integrated-gpu, try opencl with it, because it is closer to RAM.

For pure C# solution, try multiple accumulators to use cpu pipelines better. Use them in unsafe context. Read by int or long, not bytes. Then process it using bithacks unless C# is already doing vectorizations.

Scanning for an average doesn't use multiplication units. So you can multiply things with some interleaved code or doing async. Maybe you can blend some other bitmaps meanwhile?

c[i]=a[i]+b[i]

is 18 times faster with fully optimized gpgpu method compared to simple C# one-liner. I'm using Visual Studio 2015 Community Edition (project in release mode and 64-bit targeted). Using Intel HD-400 iGPU(600MHz) and C3060(1.6GHz) (single channel RAM) this is a low end laptop and CPU usage was %50ish instead of %70ish of pure C#.