I have trained a pytorch resnet50 faster RCNN (fpn V2) model in python and exported that to ONNX format. I need to use C# to load the model and perform predictions. I have written some test code to do this.
The input image size for the model seems to be 576x720 even though it was trained with 720x576 images. That's not a massive problem because I can easily resize the images, perform predictions, and then resize them back. I don't know why this happened during training, and it may have something to do with my problem.
The results I'm getting back from C# are not very good. No objects are detected at all in my images, whereas in my python code it is working fine. I have noticed that in C# that most of the time I receive an error from ONNX saying that it attempted to divide by zero, but when I don't get that error the objects it detects are just rubbish.
-resultsArray {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue[3]} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue[]
[0] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
[1] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
ElementType Int64 Microsoft.ML.OnnxRuntime.Tensors.TensorElementType
Name "2546" string
Value {"Attempted to divide by zero."} object {Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<long>}
ValueType ONNX_TYPE_TENSOR Microsoft.ML.OnnxRuntime.OnnxValueType
_disposed false bool
_mapHelper null Microsoft.ML.OnnxRuntime.MapHelper
_name "2546" string
_ortValueHolder {Microsoft.ML.OnnxRuntime.OrtValueTensor<long>} Microsoft.ML.OnnxRuntime.IOrtValueOwner {Microsoft.ML.OnnxRuntime.OrtValueTensor<long>}
_value {"Attempted to divide by zero."} object {Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<long>}
[2] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
Here is the C# code I have so far. I believe the problem lies with the processing of the images and the setup of the tensors, etc, however I don't really know enough about it to be sure.
The model was trained without any explicit normalisation, etc, just the raw RGB image. Even so, I was able to achieve an average validation IoU approaching 95%
private void cmdAnalyse_Click(object sender, EventArgs e)
{
// begin analysis
if (this.txtONNXFile.Text == "")
{
MessageBox.Show("Please select an ONNX file");
return;
}
if (this.originalImage == null)
{
MessageBox.Show("Please select an image");
return;
}
// flip the width and height dimensions. Images are 720x576, but the model expects 576x720
this.rescaledImage = new Bitmap(originalImage.Height, originalImage.Width);
Graphics graphics = Graphics.FromImage(rescaledImage);
graphics.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
graphics.DrawImage(originalImage, 0, 0, rescaledImage.Width, rescaledImage.Height);
Microsoft.ML.OnnxRuntime.Tensors.Tensor<float> input = new Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<float>(new[] { 1, 3, 720, 576 });
BitmapData bitmapData = rescaledImage.LockBits(new System.Drawing.Rectangle(0, 0, rescaledImage.Width, rescaledImage.Height), ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);
int stride = bitmapData.Stride;
IntPtr scan0 = bitmapData.Scan0;
unsafe
{
byte* ptr = (byte*)scan0;
for (int y = 0; y < rescaledImage.Height; y++)
{
for (int x = 0; x < rescaledImage.Width; x++)
{
int offset = y * stride + x * 3;
input[0, 0, y, x] = ptr[offset + 2]; // Red channel
input[0, 1, y, x] = ptr[offset + 1]; // Green channel
input[0, 2, y, x] = ptr[offset]; // Blue channel
}
}
}
rescaledImage.UnlockBits(bitmapData);
var inputs = new List<Microsoft.ML.OnnxRuntime.NamedOnnxValue>
{
Microsoft.ML.OnnxRuntime.NamedOnnxValue.CreateFromTensor("images", input)
};
// run inference
var session = new Microsoft.ML.OnnxRuntime.InferenceSession(this.txtONNXFile.Text);
Microsoft.ML.OnnxRuntime.IDisposableReadOnlyCollection<Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue> results = session.Run(inputs);
// process results
var resultsArray = results.ToArray();
float[] boxes = resultsArray[0].AsEnumerable<float>().ToArray();
long[] labels = resultsArray[1].AsEnumerable<long>().ToArray();
float[] confidences = resultsArray[2].AsEnumerable<float>().ToArray();
var predictions = new List<Prediction>();
var minConfidence = 0.0f;
for (int i = 0; i < boxes.Length; i += 4)
{
var index = i / 4;
if (confidences[index] >= minConfidence)
{
predictions.Add(new Prediction
{
Box = new Box(boxes[i], boxes[i + 1], boxes[i + 2], boxes[i + 3]),
Label = LabelMap.Labels[labels[index]],
Confidence = confidences[index]
});
}
}
System.Drawing.Graphics graph = System.Drawing.Graphics.FromImage(this.rescaledImage);
// Put boxes, labels and confidence on image and save for viewing
foreach (var p in predictions)
{
System.Drawing.Pen pen = new System.Drawing.Pen(System.Drawing.Color.Red, 2);
graph.DrawRectangle(pen, p.Box.Xmin, p.Box.Ymin, p.Box.Xmax - p.Box.Xmin, p.Box.Ymax - p.Box.Ymin);
}
graph.Flush();
graph.Dispose();
// rescale image back
System.Drawing.Bitmap bmpResult = new Bitmap(this.originalImage.Width, this.originalImage.Height);
graphics = Graphics.FromImage(bmpResult);
graphics.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
graphics.DrawImage(rescaledImage, 0, 0, originalImage.Width, originalImage.Height);
graphics.Flush();
graphics.Dispose();
//graph.ScaleTransform(720, 576);
this.pbRibeye.Width = bmpResult.Width;
this.pbRibeye.Height = bmpResult.Height;
this.pbRibeye.Image = bmpResult;
//bmpResult.Dispose();
rescaledImage.Dispose();
}
and the python code that works is:
ort_session = onnxruntime.InferenceSession(ONNXFile)
# Preprocess the input image
image = Image.open(image_path) # Load the image using PIL
resized_image = image.resize((576, 720)) # If this is omitted then I receive an error regarding the expected input dimensions
transform = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(), # Convert PIL image to tensor
])
input_tensor = transform(resized_image)
input_tensor = input_tensor.unsqueeze(0) # Add a batch dimension
# Run the model
outputs = ort_session.run(None, {'images': input_tensor.numpy()})
The python code is able to produce a valid output which is in line with the results I was achieving during validation.
It turns out that everything is mostly fine with the code
I stumbled upon the solution: even though the model was not trained with any image normalisation or preprocessing, it apparently needs images to be normalised to be able to generate predictions now!
The updated code section is:
unsafe
{
byte* ptr = (byte*)scan0;
for (int y = 0; y < originalImage.Height; y++)
{
for (int x = 0; x < originalImage.Width; x++)
{
int offset = y * stride + x * 3;
input[0, 0, y, x] = ptr[offset + 2] / 255.0f; // Red channel
input[0, 1, y, x] = ptr[offset + 1] / 255.0f; // Green channel
input[0, 2, y, x] = ptr[offset] / 255.0f; // Blue channel
}
}
}
Which performs the normalisation on each pixel as it copies them to the tensor in BGR order.
The reason I can use originalImage now is because I was able to remove all the extra bitmaps from the code: I realised that a DenseTensor just seems to be a bitmap, so size/orientation of the image really doesn't affect the result, as long as the tensor is the correct length - at least this seems to be the case for C#. Python seems to care about the dimensions being correct.