startTime = time.time()
blob = cv2.dnn.blobFromImage(img, float(1.0/255.0), (frameWidth,frameHeight), (0,0,0), swapRB = True, crop = False)
yolo.setInput(blob)
layerOutput = yolo.forward(outputLayers)
endTime = time.time()
Python code that I am measuring the time
auto start = chrono::steady_clock::now();
blob = blobFromImage(images[i], 1.0f/255.0f, Size(frameWidth, frameHeight), Scalar(0,0,0), true, false);
net.setInput(blob);
net.forward(layerOutput, getOutputsNames(net));
auto end = chrono::steady_clock::now();
C++ code that I am measuring time
In C++:
blob is Mat
type, layerOutput is vector<Mat>
type, getOutputsNames
returns in vector<string>
names.
In python:
blob is numpy.ndarray
type, layerOutput
is tuple
type, outputLayers
is a list
type object.
Both backends and targets are the same and backend is opencv, target is cpu, and I am using same yolov4 weight and config files in the same directories
When measuring the time, it takes ~180-200 ms in python, yet in C++ it takes ~220-250 ms. Since C++ is a compiled language, I expect C++ to be work quite fast than the python, which is not the case surprisingly.
What might be the reason that python works faster than the C++? Also what are your solutions to this?
Thanks in advance!
I figured what the problem is, I have customized OpenCV for c++ to gain advantage of the CUDA cores in my Jetson Orin, yet the python uses general OpenCV stored in other directory, which doesn't have CUDA support. When I changed the OpenCV compilation for C++ to the general one, it worked fast as expected since in my customized compilation I also customized the CPU parallelization which seems to be slower than the default one.