python image tensorflow opencv video-processing

Make prediction faster after training [ Increasing predicted video fps]

Trained a model with mobilenetV3Large which does segmentation process but in the prediction time, its processing time is not that good. Approximate FPS: 3.95 .

I want to make it at least 20fps. Also attaching the example code. Thanks!

from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import imutils
import time
import cv2


model = load_model('model.h5', custom_objects={'loss': loss, "dice_coefficient": dice_coefficient}, compile = False)

cap = VideoStream(src=0).start()
# warm up the camera for a couple of seconds
time.sleep(2.0)

# Start the FPS timer
fps = FPS().start()

while True:

    frame = cap.read()

    # Resize each frame
    resized_image = cv2.resize(frame, (256, 256))

    resized_image = tf.image.convert_image_dtype((resized_image/255.0), dtype=tf.float32).numpy()
    mask = model.predict(np.expand_dims(resized_image[:,:,:3], axis=0))[0]

    # show the output frame
    cv2.imshow("Frame", mask)

    key = cv2.waitKey(1) & 0xFF
    # Press 'q' key to break the loop
    if key == ord("q"):
        break

    # update the FPS counter
    fps.update()

# stop the timer
fps.stop()

# Display FPS Information: Total Elapsed time and an approximate FPS over the entire video stream
print("[INFO] Elapsed Time: {:.2f}".format(fps.elapsed()))
print("[INFO] Approximate FPS: {:.2f}".format(fps.fps()))

# Destroy windows and cleanup
cv2.destroyAllWindows()
# Stop the video stream
cap.stop()

EDIT-1

After doing float16 quantization, loaded the model as tflite_model then feed input(image) into the model. But the result was even slower!! is it the correct approach?

interpreter = tf.lite.Interpreter('tflite_model.tflite')
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

....................... process .............

while True:
    
    .............  process ............
    
    interpreter.set_tensor(input_details[0]['index'], np.expand_dims(resized_image[:,:,:3], axis=0))
    interpreter.invoke()
    mask = interpreter.get_tensor(output_details[0]['index'])[0]
#     mask = model.predict(np.expand_dims(resized_image[:,:,:3], axis=0))[0]

    ............ display part ........

Solution

It is possible to make faster with different ways:

Model quantization:

TensorFlow Lite supports converting weights to 16-bit floating point Maybe its the easiest way to way to save the model in
```
tf.float16
```
Or retrain with float16 or float8 will be more faster https://www.tensorflow.org/lite/performance/post_training_float16_quant
Model distillation: you can train little model that will in trained with the loss function of your model and can learn all from your big model.
Model pruning You can compress you model with pruning and it will be faster. About pruning you can read in tensorflow documentation too