python-3.x keras object-detection tensorflow2.0 yolo

Speeding up and understanding Python Keras predict method results analysis

I'm using Keras and Tensorflow to perform object detection using Yolov3 standard as well as Yolov3-Tiny (about 10x faster). Everything is working but performance is fairly poor, I'm getting about one frame every 2 seconds on the GPU and one frame every 4 seconds or so on the CPU. In profiling the code, it turns out the decode_netout method is taking a lot of time. I was generally following this tutorial as an example.

Can someone help walk me through what it's doing?
Are there alternative methods baked into Tensorflow (or other libraries) that could do these calculations? I swapped out some custom Python for tf.image.non_max_suppression for example and it helped out quite a bit in terms of performance.

# https://keras.io/models/model/
yhat = model.predict(image, verbose=0, use_multiprocessing=True)
# define the probability threshold for detected objects
class_threshold = 0.6
boxes = list()
for i in range(len(yhat)):
    # decode the output of the network
    boxes += detect.decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)

def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
    grid_h, grid_w = netout.shape[:2]
    nb_box = 3
    netout = netout.reshape((grid_h, grid_w, nb_box, -1))
    boxes = []
    netout[..., :2]  = _sigmoid(netout[..., :2])
    netout[..., 4:]  = _sigmoid(netout[..., 4:])
    netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
    netout[..., 5:] *= netout[..., 5:] > obj_thresh

    for i in range(grid_h*grid_w):
        row = i / grid_w
        col = i % grid_w
        for b in range(nb_box):
            # 4th element is objectness score
            objectness = netout[int(row)][int(col)][b][4]
            if(objectness.all() <= obj_thresh): continue
            # first 4 elements are x, y, w, and h
            x, y, w, h = netout[int(row)][int(col)][b][:4]
            x = (col + x) / grid_w # center position, unit: image width
            y = (row + y) / grid_h # center position, unit: image height
            w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
            h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
            # last elements are class probabilities
            classes = netout[int(row)][col][b][5:]
            box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
            boxes.append(box)
    return boxes

Solution

I have a similar setup with a GPU and have been facing the same problem. I have been working on a YoloV3 Keras project and have been chasing exact issue for past 2 weeks . After finally timeboxing all my functions I found narrowed down the issue to 'def do_nms' which then lead me to the function you have posted above 'def decode_netout'. The issue is that the Non-Max-Suppression is slow.

The solution I found was adjusting this line

if(objectness.all() <= obj_thresh): continue

if (objectness <= obj_thresh).all(): continue

The performance difference is night and day. I am pushing near 30 FPS and everything is working much better.

Credit goes to this Git issue/solution:

https://github.com/experiencor/keras-yolo3/issues/177

It took me a while to figure this out, so I hope this helps others.