Search code examples
pythonimage-processingconv-neural-networkvideo-captureopencv

How to get each frame as an image from a cv2.VideoCapture in python


I want to get each frame from a video as an image. background to this is following. I have written a Neural Network which is able to recognize Hand Signs. Now I want to start a video stream, where each image/frame of the stream is put through the Neural Network. To fit it into my neural Network, I want to render each frame and reduce the image to 28*28 pixels. In the end it should look similar to this: https://www.youtube.com/watch?v=JfSao30fMxY I have searched through the web and found out that I can use cv2.VideoCapture to get the stream. But how can I pick each image of the Frame, render it and print the result back on the screen. My Code looks like this until now:

import numpy as np
import cv2

cap = cv2.VideoCapture(0)

# Todo: each Frame/Image from the video should be saved as a variable and open imageToLabel()
# Todo: before the image is handed to the method, it needs to be translated into a 28*28 np Array
# Todo: the returned Label should be printed onto the video (otherwise it can be )

i = 0
while (True):
    # Capture frame-by-frame
    # Load model once and pass it as an parameter

    ret, frame = cap.read()
    i += 1

    image = cv2.imwrite('database/{index}.png'.format(index=i), frame)
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2BGRAY)

    cv2.imshow('frame', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

def imageToLabel(imgArr, checkpointLoad):
    new_model = tf.keras.models.load_model(checkpointLoad)
    imgArrNew = imgArr.reshape(1, 28, 28, 1) / 255
    prediction = new_model.predict(imgArrNew)
    label = np.argmax(prediction)
    return label


Solution

  • frame is the RGB Image you get from the stream. gray is the grayscale converted image. I suppose your network takes grayscaled images because of its shape. Therefor you need to first resize the image to (28,28) and then pass it to your imageToLabel function

    resizedImg = cv2.resize(gray,(28,28))
    label = imageToLabel(resizedImg,yourModel)
    

    now that you know the prediction you can draw it on the frame using e.g. cv2.putText() and then draw the frame it returns instead of frame

    edit:

    If you want to use parts of the image for your network you can slice the image like this:

    slicedImg = gray[50:150,50:150]
    resizedImg = cv2.resize(slicedImg,(28,28))
    label = imageToLabel(resizedImg,yourModel)
    

    If you're not that familiar with indexing in python you might want to take a look at this

    Also if you want it to look like in the linked video you can draw a rectangle from e.g. (50,50) to (150,150) that is green (0,255,0)

    cv2.rectangle(frame,(50,50),(150,150),(0,255,0))