python matlab tensorflow conv-neural-network onnx

Why CNN running in python is extremely slow in comparison to Matlab?

I have trained a CNN in Matlab 2019b that classifies images between three classes. When this CNN was tested in Matlab it was functioning fine and only took 10-15 seconds to classify an image. I used the exportONNXNetwork function in Maltab so that I can implement my CNN in Tensorflow. This is the code I am using to use the ONNX file in python:

import onnx
from onnx_tf.backend import prepare 
import numpy as np
from PIL import Image 

onnx_model = onnx.load('trainednet.onnx')
tf_rep = prepare(onnx_model)
filepath = 'filepath.png' 

img = Image.open(filepath).resize((224,224)).convert("RGB") 
img = array(img).transpose((2,0,1))
img = np.expand_dims(img, 0) 
img = img.astype(np.uint8) 

probabilities = tf_rep.run(img) 
print(probabilities)

When trying to use this code to classify the same test set, it seems to be classifying the images correctly but it is very slow and freezes my computer as it reaches high memory usages of up to 95+% at some points.

I also noticed in the command prompt while classifying it prints this:

2020-04-18 18:26:39.214286: W tensorflow/core/grappler/optimizers/meta_optimizer.cc:530] constant_folding failed: Deadline exceeded: constant_folding exceeded deadline., time = 486776.938ms.

Is there any way I can make this python code classify faster?

Solution

In this case, it appears that the Grapper optimization suite has encountered some kind of infinite loop or memory leak. I would recommend filing an issue against the Github repo.

It's challenging to debug why constant folding is taking so long, but you may have better performance using the ONNX TensorRT backend as compared to the TensorFlow backend. It achieves better performance as compared to the TensorFlow backend on Nvidia GPUs while compiling typical graphs more quickly. Constant folding usually doesn't provide large speedups for well optimized models.

import onnx
import onnx_tensorrt.backend as backend
import numpy as np

model = onnx.load("trainednet.onnx'")
engine = backend.prepare(model, device='CUDA:1')

filepath = 'filepath.png' 

img = Image.open(filepath).resize((224,224)).convert("RGB") 
img = array(img).transpose((2,0,1))
img = np.expand_dims(img, 0) 
img = img.astype(np.uint8) 
output_data = engine.run(img)[0]
print(output_data)