I'm trying to use the cvlib package which use yolov3 model to recognize objects on images on windows 10. Let's take an easy example:
import cvlib as cv
import time
from cvlib.object_detection import draw_bbox
inittimer=time.time()
bbox, label, conf = cv.detect_common_objects(img,confidence=0.5,model='yolov3-worker',enable_gpu=True)
print('The process tooks %.3f s'%(time.time()-inittimer)
output_image = draw_bbox(img, bbox, label, conf)
The results give ~60ms.
cvlib use opencv to compute this cnn part.
If now I try to see how much GPU tensorflow used, using subprocess, It tooks only 824MiB.
while the program runs, if I start nvidia-smi it gives me this result:
As u can see there is much more memory available here. My question is simple.. why Cvlib (and so tensorflow) doesn't use all of it to improve the time's detection?
EDIT:
As far as I understand, cvlib use tensorflow but it also use opencv detector. I installed opencv using cmake and Cuda 10.2 I don't understand why but in the nvidia-smi it's written CUDA Version : 11.0 which is not. Maybe that's the part of the problem?
I got it! The problem come from the fact that I created a new Net
object for each itteration.
Here is the related issue on github where you can follow it: https://github.com/opencv/opencv/issues/16348
With a custom function, it now works at ~60 fps. Be aware that cvlib is, maybe, not done for real time computation.