Search code examples

How to get each frame as an image from a cv2.VideoCapture in python

I want to get each frame from a video as an image. background to this is following. I have written a Neural Network which is able to recognize Hand Signs. Now I want to start a video stream, where each image/frame of the stream is put through the Neural Network. To fit it into my neural Network, I want to render each frame and reduce the image to 28*28 pixels. In the end it should look similar to this: I have searched through the web and found out that I can use cv2.VideoCapture to get the stream. But how can I pick each image of the Frame, render it and print the result back on the screen. My Code looks like this until now:

import numpy as np
import cv2

cap = cv2.VideoCapture(0)

# Todo: each Frame/Image from the video should be saved as a variable and open imageToLabel()
# Todo: before the image is handed to the method, it needs to be translated into a 28*28 np Array
# Todo: the returned Label should be printed onto the video (otherwise it can be )

i = 0
while (True):
    # Capture frame-by-frame
    # Load model once and pass it as an parameter

    ret, frame =
    i += 1

    image = cv2.imwrite('database/{index}.png'.format(index=i), frame)
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2BGRAY)

    cv2.imshow('frame', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):

# When everything done, release the capture

def imageToLabel(imgArr, checkpointLoad):
    new_model = tf.keras.models.load_model(checkpointLoad)
    imgArrNew = imgArr.reshape(1, 28, 28, 1) / 255
    prediction = new_model.predict(imgArrNew)
    label = np.argmax(prediction)
    return label


  • frame is the RGB Image you get from the stream. gray is the grayscale converted image. I suppose your network takes grayscaled images because of its shape. Therefor you need to first resize the image to (28,28) and then pass it to your imageToLabel function

    resizedImg = cv2.resize(gray,(28,28))
    label = imageToLabel(resizedImg,yourModel)

    now that you know the prediction you can draw it on the frame using e.g. cv2.putText() and then draw the frame it returns instead of frame


    If you want to use parts of the image for your network you can slice the image like this:

    slicedImg = gray[50:150,50:150]
    resizedImg = cv2.resize(slicedImg,(28,28))
    label = imageToLabel(resizedImg,yourModel)

    If you're not that familiar with indexing in python you might want to take a look at this

    Also if you want it to look like in the linked video you can draw a rectangle from e.g. (50,50) to (150,150) that is green (0,255,0)
