Search code examples
pythonpython-3.xopencvartificial-intelligenceobject-detection

Python OpenCV - while saving video based on a specific object condition not all such frames are getting saved


I am using opencv in Python and trying to record/save only those frames from video when a particular type of object/label is present in the frame for example 'umbrella'

Issues:

It correctly start saving frames from the instance where it first find that mentioned object/label in a frame but if that object/label is not there in next few frames and appears only after few frames then those frame are not getting saved to the mp4 file that I am saving it.

It only saves first continuous frames with mentioned object and do not save for later ones.

After reading suggestions from this link I edited code by putting frame writing steps within a for-loop as shown below: OpenCV - Save video segments based on certion condition

Frame writing piece of code that I have tried to improvise

# saving video frame by frame             
for frame_numb in range(total_frames):                
    if i == '':
        pass
    else:
        if "umbrella" in label:
            print("umbrella in labels")

            # Issue causing part where I may need some change
            out_vid.write(frame[frame_numb])

Result of above code changes:

It creates only 256kb file and files fail to open/ not writing anything

If I do below changes in code then it saves only the first frame of the video where that condition is met and runs the same frame over the complete time

    # saving video frame by frame             
    for frame_numb in range(total_frames):                
        if i == '':
            pass
        else:
            if "umbrella" in label:
                print("umbrella in labels")

                # Issue causing part where I may need some change
                out_vid.write(frame)

Sharing bigger chunk of code below for reference:

def vid_objects_detection(type=0, confidence_threshold=0.5, image_quality=416):

    classes = []

    # reading category names from coco text file and inserting in classes list
    with open("coco.names", "r") as f:
        classes = [line.strip() for line in f.readlines()]

    net = cv2.dnn.readNet("yolov3-tiny.weights", "yolov3-tiny.cfg") # using tiny versions of weights & config file

    layer_names = net.getLayerNames()    
    output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

    # Loading video
    cap = cv2.VideoCapture(type)  # use 0 for webcam   

    _, frame = cap.read()
    height, width, channels = frame.shape

    # providing codec for writing frames to video 
    fourcc = cv2.VideoWriter_fourcc(*'MP4V')

    # Write video with name & size. Should be of same size(width, height) as original video
    out_vid = cv2.VideoWriter('obj_detect4_'+str(type), fourcc, 20.0, (width,height))

    font = cv2.FONT_HERSHEY_COMPLEX_SMALL 
    starting_time = time.time()
    frame_id = 0

    while True:
        _, frame = cap.read()

        frame_id +=1

        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        height, width, channels = frame.shape       

        blob = cv2.dnn.blobFromImage(frame, 0.00392, (image_quality, image_quality), (0, 0, 0), True, crop=False)
        net.setInput(blob)

        outs = net.forward(output_layers)

        # For showing informations on screen
        class_ids = []
        confidences = []
        boxes = []
        for out in outs:
            for detection in out:
                # claculated scores, class_id, confidence                   

                if confidence > confidence_threshold:                      
                    # claculatedd center_x, center_y, w,h,x,y
                    boxes.append([x, y, w, h])
                    confidences.append(float(confidence))
                    class_ids.append(class_id)
                    print("confidences:", confidences)
                    print(class_ids)
                    print("boxes", boxes)

        indexes = cv2.dnn.NMSBoxes(boxes, confidences, confidence_threshold, 0.4)

        for i in range(len(boxes)):
            if i in indexes:
                x, y, w, h = boxes[i]
                label = str(classes[class_ids[i]])

        elapsed_time = time.time() - starting_time
        fps = frame_id / elapsed_time
        time_display = time.strftime("%a, %d%b%Y %H:%M:%S", time.localtime())
        cv2.putText(frame,"|FPS: " + str(round(fps,3)), (10, 40), font, 1, (0,255,0), 1)
        print(fps)

        # saving video frame by frame 
    if i == '':
        pass
    else:
        if 'umbrella' in label:
            out_vid.write(frame)

        key = cv2.waitKey(5)
        if key == 27: 
            break

    cap.release()
    out_vid.release()
    cv2.destroyAllWindows()

# calling function
vid_objects_detection("walking.mp4")

I have trimmed some minor calculations in the code and inserted comments instead to reduce length of the code


Solution

  • Sometimes video codec perform what is called keyframe compression. This means, that one frame is stored completely, say every 10 frames and all the other frames in between are stored as changes or deltas. In these case, when you try to save these in-between frames only, they might not get saved. But however in these cases, saving frames works if you iterate sequentially over every frame.

    May be you can comment out the line out_vid = cv2.VideoWriter('obj_detect4_'+str(type), fourcc, 20.0, (width,height)), and just try saving the frames from the webcam stream based on your condition.