Search code examples
c#.netcomputer-visiononnxonnxruntime

ONNX model is not returning predictions in C#


I have trained a pytorch resnet50 faster RCNN (fpn V2) model in python and exported that to ONNX format. I need to use C# to load the model and perform predictions. I have written some test code to do this.

The input image size for the model seems to be 576x720 even though it was trained with 720x576 images. That's not a massive problem because I can easily resize the images, perform predictions, and then resize them back. I don't know why this happened during training, and it may have something to do with my problem.

The results I'm getting back from C# are not very good. No objects are detected at all in my images, whereas in my python code it is working fine. I have noticed that in C# that most of the time I receive an error from ONNX saying that it attempted to divide by zero, but when I don't get that error the objects it detects are just rubbish.

-resultsArray {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue[3]} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue[]

  •   [0] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
    
  •   [1] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
      ElementType Int64   Microsoft.ML.OnnxRuntime.Tensors.TensorElementType
      Name    "2546"  string
    
  •   Value   {"Attempted to divide by zero."}    object {Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<long>}
      ValueType   ONNX_TYPE_TENSOR    Microsoft.ML.OnnxRuntime.OnnxValueType
      _disposed   false   bool
      _mapHelper  null    Microsoft.ML.OnnxRuntime.MapHelper
      _name   "2546"  string
    
  •   _ortValueHolder {Microsoft.ML.OnnxRuntime.OrtValueTensor<long>} Microsoft.ML.OnnxRuntime.IOrtValueOwner {Microsoft.ML.OnnxRuntime.OrtValueTensor<long>}
    
  •   _value  {"Attempted to divide by zero."}    object {Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<long>}
    
  •   [2] {Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue} Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue
    

Here is the C# code I have so far. I believe the problem lies with the processing of the images and the setup of the tensors, etc, however I don't really know enough about it to be sure.

The model was trained without any explicit normalisation, etc, just the raw RGB image. Even so, I was able to achieve an average validation IoU approaching 95%

        private void cmdAnalyse_Click(object sender, EventArgs e)
        {
            // begin analysis
            if (this.txtONNXFile.Text == "")
            {
                MessageBox.Show("Please select an ONNX file");
                return;
            }

            if (this.originalImage == null)
            {
                MessageBox.Show("Please select an image");
                return;
            }

            // flip the width and height dimensions. Images are 720x576, but the model expects 576x720
            this.rescaledImage = new Bitmap(originalImage.Height, originalImage.Width);

            Graphics graphics = Graphics.FromImage(rescaledImage);
            graphics.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
            graphics.DrawImage(originalImage, 0, 0, rescaledImage.Width, rescaledImage.Height);

            Microsoft.ML.OnnxRuntime.Tensors.Tensor<float> input = new Microsoft.ML.OnnxRuntime.Tensors.DenseTensor<float>(new[] { 1, 3, 720, 576 });

            BitmapData bitmapData = rescaledImage.LockBits(new System.Drawing.Rectangle(0, 0, rescaledImage.Width, rescaledImage.Height), ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);

            int stride = bitmapData.Stride;
            IntPtr scan0 = bitmapData.Scan0;

            unsafe
            {
                byte* ptr = (byte*)scan0;
                for (int y = 0; y < rescaledImage.Height; y++)
                {
                    for (int x = 0; x < rescaledImage.Width; x++)
                    {
                        int offset = y * stride + x * 3;
                        input[0, 0, y, x] = ptr[offset + 2]; // Red channel
                        input[0, 1, y, x] = ptr[offset + 1]; // Green channel
                        input[0, 2, y, x] = ptr[offset];     // Blue channel
                    }
                }
            }


            rescaledImage.UnlockBits(bitmapData);

            var inputs = new List<Microsoft.ML.OnnxRuntime.NamedOnnxValue>
            {
                Microsoft.ML.OnnxRuntime.NamedOnnxValue.CreateFromTensor("images", input)
            };


            // run inference

            var session = new Microsoft.ML.OnnxRuntime.InferenceSession(this.txtONNXFile.Text);
            Microsoft.ML.OnnxRuntime.IDisposableReadOnlyCollection<Microsoft.ML.OnnxRuntime.DisposableNamedOnnxValue> results = session.Run(inputs);


            // process results
            var resultsArray = results.ToArray();

            float[] boxes = resultsArray[0].AsEnumerable<float>().ToArray();
            long[] labels = resultsArray[1].AsEnumerable<long>().ToArray();
            float[] confidences = resultsArray[2].AsEnumerable<float>().ToArray();
            var predictions = new List<Prediction>();
            var minConfidence = 0.0f;
            for (int i = 0; i < boxes.Length; i += 4)
            {
                var index = i / 4;
                if (confidences[index] >= minConfidence)
                {
                    predictions.Add(new Prediction
                    {
                        Box = new Box(boxes[i], boxes[i + 1], boxes[i + 2], boxes[i + 3]),
                        Label = LabelMap.Labels[labels[index]],
                        Confidence = confidences[index]
                    });
                }
            }


            System.Drawing.Graphics graph = System.Drawing.Graphics.FromImage(this.rescaledImage);

            // Put boxes, labels and confidence on image and save for viewing

            foreach (var p in predictions)
            {

                System.Drawing.Pen pen = new System.Drawing.Pen(System.Drawing.Color.Red, 2);

                graph.DrawRectangle(pen, p.Box.Xmin, p.Box.Ymin, p.Box.Xmax - p.Box.Xmin, p.Box.Ymax - p.Box.Ymin);

            }

            graph.Flush();
            graph.Dispose();

            // rescale image back
            System.Drawing.Bitmap bmpResult = new Bitmap(this.originalImage.Width, this.originalImage.Height);

            graphics = Graphics.FromImage(bmpResult);
            graphics.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
            graphics.DrawImage(rescaledImage, 0, 0, originalImage.Width, originalImage.Height);

            graphics.Flush();
            graphics.Dispose();

            //graph.ScaleTransform(720, 576);
            this.pbRibeye.Width = bmpResult.Width;
            this.pbRibeye.Height = bmpResult.Height;    

            this.pbRibeye.Image = bmpResult;

            //bmpResult.Dispose();
            rescaledImage.Dispose();
        }

and the python code that works is:

ort_session = onnxruntime.InferenceSession(ONNXFile)

# Preprocess the input image

image = Image.open(image_path)  # Load the image using PIL
resized_image = image.resize((576, 720))  # If this is omitted then I receive an error regarding the expected input dimensions

transform = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),  # Convert PIL image to tensor
])

input_tensor = transform(resized_image)
input_tensor = input_tensor.unsqueeze(0)  # Add a batch dimension

# Run the model
outputs = ort_session.run(None, {'images': input_tensor.numpy()})

The python code is able to produce a valid output which is in line with the results I was achieving during validation.


Solution

  • It turns out that everything is mostly fine with the code

    I stumbled upon the solution: even though the model was not trained with any image normalisation or preprocessing, it apparently needs images to be normalised to be able to generate predictions now!

    The updated code section is:

        unsafe
        {
            byte* ptr = (byte*)scan0;
            for (int y = 0; y < originalImage.Height; y++)
            {
                for (int x = 0; x < originalImage.Width; x++)
                {
                    int offset = y * stride + x * 3;
                    input[0, 0, y, x] = ptr[offset + 2] / 255.0f; // Red channel
                    input[0, 1, y, x] = ptr[offset + 1] / 255.0f; // Green channel
                    input[0, 2, y, x] = ptr[offset] / 255.0f;     // Blue channel
                }
            }
        }
    

    Which performs the normalisation on each pixel as it copies them to the tensor in BGR order.

    The reason I can use originalImage now is because I was able to remove all the extra bitmaps from the code: I realised that a DenseTensor just seems to be a bitmap, so size/orientation of the image really doesn't affect the result, as long as the tensor is the correct length - at least this seems to be the case for C#. Python seems to care about the dimensions being correct.