I am using Microsoft OnnxRuntime to detect and classify objects in images and I want to apply it to real-time video. To do that, I have to convert each frame into an OnnxRuntime Tensor. Right now I have implemented a method that takes around 300ms:
public Tensor<float> ConvertImageToFloatTensor(Bitmap image)
{
// Create the Tensor with the appropiate dimensions for the NN
Tensor<float> data = new DenseTensor<float>(new[] { 1, image.Width, image.Height, 3 });
// Iterate over the bitmap width and height and copy each pixel
for (int x = 0; x < image.Width; x++)
{
for (int y = 0; y < image.Height; y++)
{
Color color = image.GetPixel(x, y);
data[0, y, x, 0] = color.R / (float)255.0;
data[0, y, x, 1] = color.G / (float)255.0;
data[0, y, x, 2] = color.B / (float)255.0;
}
}
return data;
}
I need this code to run as fast as possible since I am representing the output bounding boxes of the detector as a layer on top of the video. Does anyone know a faster way of doing this conversión?
based in the answers by davidtbernal (Fast work with Bitmaps in C#) and FelipeDurar (Grayscale image from binary data) you should be able to access pixels faster using LockBits and a bit of "unsafe" code
public Tensor<float> ConvertImageToFloatTensorUnsafe(Bitmap image)
{
// Create the Tensor with the appropiate dimensions for the NN
Tensor<float> data = new DenseTensor<float>(new[] { 1, image.Width, image.Height, 3 });
BitmapData bmd = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), System.Drawing.Imaging.ImageLockMode.ReadOnly, image.PixelFormat);
int PixelSize = 3;
unsafe
{
for (int y = 0; y < bmd.Height; y++)
{
// row is a pointer to a full row of data with each of its colors
byte* row = (byte*)bmd.Scan0 + (y * bmd.Stride);
for (int x = 0; x < bmd.Width; x++)
{
// note the order of colors is BGR
data[0, y, x, 0] = row[x*PixelSize + 2] / (float)255.0;
data[0, y, x, 1] = row[x*PixelSize + 1] / (float)255.0;
data[0, y, x, 2] = row[x*PixelSize + 0] / (float)255.0;
}
}
image.UnlockBits(bmd);
}
return data;
}
I've compared this piece of code averaging over 1000 runs and got about 3x performance improvement against your original code but results may vary.
Also note I've used 3 channels per pixel as your original answer uses those values only, if you use a 32bpp bitmap, you may change PixelSize to 4 and the last channel should be alpha channel (row[x*PixelSize + 3])