python tensorflow opencv object-detection webcam

Unexpected results when passing webcam frames to object detection

Background

I have a well-trained ssd320x320 tensorflow model from tensorflow-zoo. The reports are pretty good as the train log indicates a low loss and the eval log indicates that 7 out of 9 test images were detected successfully. The model was trained with GPU and was saved as ckpt3.

The goal is to detect when a person "likes" with their hand.

Problem

Loading a model from its last checkpoint works well, and I achieved detection with the following function:

    def test1(self):
        # Works great
        for img_path in glob.glob("test_dir\*.jpg"):
            plt.figure()
            plt.imshow(self.get_image_np_with_detections(self._load_image_into_numpy_array(img_path)))
            plt.show()

# Note that get get_image_np_with_detections() is the detection @tf.function() 
# as it is written in tensorflow documentation, with no changes.
# _load_image() function simply returns np.array(Image.open(path))

Object detection in image was successfuly achieved in test1. Problem is that I failed to detect an object in webcam frames.

From another function, which opens my webcam, I call the same detection function for each frame. This function is failing, as not even one green detection box appears on the screen.

    def open_webcam(self):
        # Doesn't show detection green boxes at all
        cap = cv2.VideoCapture(0)
        while cap.isOpened():
            ret, image_np = cap.read()
            im_detected = self.get_image_np_with_detections(image_np)
            cv2.imshow('object detection', cv2.resize(im_detected, (800, 600)))
        # release, destroy...

Where is the problem

During my debug, I have saved screenshots from my webcam, while running the open_webcam() function (took a screenshot every 1-2 seconds). The screenshots were saved into test_dir, and then were processed to test1. The test was successful as all screenshots were marked with a green detection box (hand-like-sign).

This test indicates that the problem regards the way I pass frames to the function, as all the frames were successfully detected in the test1 approach, but not in real-time. To summarize:

I failed to detect a like-sign in a webcam frame (real-time).
I saved the frame inside test_dir, with a unique-id.
I managed to detect a like-sign after opening the jpg, in test1() (9/10 screenshots).

I have tried to...

pass frames as numpy array with no luck.
Expand the dimensions as mentioned in tf-documentation, again, with no luck.

Note that...

I have only 1 label, which is Like (hand-sign).
I used around 25 train images, and 9 test images.
As mentioned, the model works great when opening saved jpg files. Eval report looks good.
PY is 3.7, TF is 2.7, CV is 4.5.5.

Thanks in advance!

Solution

TensorFlow model is most likely to be trained on RGB images, while cv2 works with BGR. Try

image_np  = cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB)

Also, model may be trained on normalized images, so, if changing BGR to RGB doesn't help, try

image_np = image_np / 255.