Search code examples
c#performancegetpixellockbits

Improve speed of video correction algorithm based on locked bitmap, getpixel and optimized auxiliary functions


I wrote this code to remove annoying patterns in a video due to camera malfunction. The problem is that to encode a 2 Minute video this algorithm needs more than 2 hours. I want to significantly reduce the time needed.

The algorithm iterates over each image, looks at each 4 pixels from there, creates the average and if the average is below a threshold, sets the current pixel to white. I could use step = 2 and set the 2x2 matrix to white, yet this deteriorates images quality and only increases speed by half.

I already added the lockbitmap, lockbits and improved the auxiliary functions.

Before and after the using (d2 = reader.ReadVideoFrame())-snippet I have an aforge videoreader and writer based on ffmpeg.

using (d2 = reader.ReadVideoFrame())
{
    LockBitmap lockBitmap = new LockBitmap(d2);
    lockBitmap.LockBits();

    Color v = Color.FromArgb(240, 237, 241);
    for (int x = 0; x < lockBitmap.Width-1; x = x + 1)
    {
        for (int y = 0; y < lockBitmap.Height-1; y = y + 1)
        {
            Color dus = durchschnitt(lockBitmap.GetPixel(x, y),
                lockBitmap.GetPixel(x + 1, y),
                lockBitmap.GetPixel(x, y + 1),
                lockBitmap.GetPixel(x + 1, y + 1));
            if (abstand(dus, v) < 50)
            {
                lockBitmap.SetPixel(x, y, Color.White);
            }
        }
    }
    lockBitmap.UnlockBits();
}

Auxiliary functions:

private Color durchschnitt(Color c1, Color c2, Color c3, Color c4)
{
    return Color.FromArgb((int)((c1.R + c2.R + c3.R + c4.R) / 4),
        (int)((c1.G + c2.G + c3.G + c4.G) / 4),
        (int)((c1.B + c2.B + c3.B + c4.B) / 4));
}

and

private double abstand(Color c1, Color c2)
{
    return Math.Sqrt(Math.Pow(c2.R - c1.R, 2) + Math.Pow(c2.G - c1.G, 2) + Math.Pow(c2.B - c1.B, 2));
}

LockBitmap is from here.


Solution

  • This is not how lockBits work.

    In short you need to lock the bits to get access to the scanline via pointers. Best to use 32bpp for integer access. You can calculate the pixel in the contiguous array as follows.

    You will need to decorate you class or your method with the unsafe keyword and also set the project build option to use unsafe code, you use pointers.

    var w = bmp.Width;
    var h = bmp.Height;
    
    // lock the array for direct access
    var data = bmp.LockBits(new Rectangle(0, 0, w, h), ImageLockMode.ReadWrite, PixelFormat.Format32bppPArgb);
    var scan0Ptr = (int*)data.Scan0;
    
    // get the stride
    var stride = data.Stride / 4;
    
    // scan all x
    for (var x = 0; x < w; x++)
    {
       var pX = scan0Ptr + x;
    
       // scan all y
       for (var y = 0; y < h; y++)
       {
          // this is now your pixel *p, which essentially is a pointer to
          // to a memory which contains your pixel
          var p = pX + y * stride;
    
          // or for better access to X and Y
          var p = scan0Ptr + x + y * stride;
    
          // or alternatively you can just access you pixel with the following notation  
          *(scan0Ptr + x + y * stride) // <== just use this like any variable 
    
          // example how to convert pixel *p to a Color
          var color = Color.FromArgb(*p);
    
          // example Convert a Color back to *p and update the pixel
          *p = Color.White.ToArgb();
       }
    }
    // unlock the bitmap
    bmp.UnlockBits(data);
    

    I'll leave the rest of the details up to you, however this will give you the best performance (minus some micro optimizations).

    Lastly reducing the calls to external methods will also give you a performance boost; however, if you really need to, you can help the CLR by using MethodImplOptions Enum for AggressiveInlining

    E.g

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private Color durchschnitt(Color c1, Color c2, Color c3, Color c4)
    

    Also you might get a performance boost, by shifting out the components:

    var r = ((*p >> 16) & 255);
    var g = ((*p >> 8) & 255);
    var b = ((*p >> 0) & 255);
    

    And you could probably multi-thread this workload.