opencv doesn't use all GPU memory

I'm trying to use the cvlib package which use yolov3 model to recognize objects on images on windows 10. Let's take an easy example:

import cvlib as cv
import time
from cvlib.object_detection import draw_bbox


inittimer=time.time()
bbox, label, conf = cv.detect_common_objects(img,confidence=0.5,model='yolov3-worker',enable_gpu=True)
print('The process tooks %.3f s'%(time.time()-inittimer)

output_image = draw_bbox(img, bbox, label, conf)

The results give ~60ms.

cvlib use opencv to compute this cnn part.

If now I try to see how much GPU tensorflow used, using subprocess, It tooks only 824MiB.

while the program runs, if I start nvidia-smi it gives me this result:

As u can see there is much more memory available here. My question is simple.. why Cvlib (and so tensorflow) doesn't use all of it to improve the time's detection?

EDIT:

As far as I understand, cvlib use tensorflow but it also use opencv detector. I installed opencv using cmake and Cuda 10.2 I don't understand why but in the nvidia-smi it's written CUDA Version : 11.0 which is not. Maybe that's the part of the problem?

Solution

I got it! The problem come from the fact that I created a new Net object for each itteration.

Here is the related issue on github where you can follow it: https://github.com/opencv/opencv/issues/16348

With a custom function, it now works at ~60 fps. Be aware that cvlib is, maybe, not done for real time computation.