Search code examples
python-3.xopencvcomputer-visionyolo

How do I get the velocity of a tracked object without calibration?


I am using YoloV4 and Deepsort to detect and track people in a frame.

My goal is to get speed of a moving person in meaningful units without calibration as I would like to be able to move the camera with this model to different rooms without having to calibrate it each time. I understand this to be a very difficult problem. I am currently getting speed as pixels per second. But that is inaccurate as items closer to frame are "moving" faster.

My question is if I can use the bounding box of the person detection as a measurement of the size of a person in pixels and if I can average the size of a human being (say 68 inches height by 15 inches width) and have the necessary "calibration" metrics to determine in inches/s the object moved from Point A to Point B in the frame as a reflection of the size of the person from Region A to Region B?

In short, is there a way to get velocity from the size of an object to determine how fast it is moving in a frame?

Any suggestions would be helpful! Thanks!

This is how I am calculating speed now.


# # Calculate the center of the bounding box
xCenter = int((bbox[0] + bbox[2]) / 2)
yCenter = int((bbox[1] + bbox[3]) / 2)

# Get metrics from metrics {track_id : [[frames, xCenter, yCenter], [frames, xCenter, yCenter]] }
values = metrics[track_id]

# # calculate displacement, velocity and speed.
if len(values) > 1:
    delta_frames = values[-1][0] - values[-2][0]
    delta_t = delta_frames / fps     #fps = 30
    delta_x = values[-1][1] - values[-2][1]
    delta_y = values[-1][2] - values[-2][2]

    total_displacement = math.sqrt(delta_x ** 2 + delta_y ** 2)

    speed = total_displacement / delta_t


Solution

  • I think this is the answer I have been looking for.

    I calculate the height and width of the bounding box. I get the pixels per inch of that bounding box by dividing it by the average human height and width. And then I sum the linspace() between the pixels per inch of Region A to the pixels per inch of Region B to get the distance. It's not very accurate though so maybe I can improve on that somehow.

    Mainly the inaccuracies will come from the bounding box. It looks like top to bottom the bounding box is pretty good but left to right (width) it's not good as it's taking into account the arms and legs. I am going to see if I can use just a human head detection as a measurement.

    # # Average human dimensions in inches
    avg_person_width = 15
    avg_person_height = 65
    
    # # width, Height of the bounding box in pixels
    bbox_width = bbox[2] - bbox[0]
    bbox_height = bbox[3] - bbox[1]
    
    # Pixels per inch within the bounding box
    pixels_per_inch_width = bbox_width / avg_person_width
    pixels_per_inch_height = bbox_height / avg_person_height
    
    if track.track_id in metrics:
    # append the new number to the existing array at this slot
        metrics[track_id].append([frame_idx, xCenter, yCenter, pixels_per_inch_width, pixels_per_inch_height])
    else:
        # create a new array in this slot
        metrics[track_id] = [[frame_idx, xCenter, yCenter, pixels_per_inch_width, pixels_per_inch_height]]
    
    values = metrics[track_id]
    
    # # calculate displacement, velocity and speed.
    if len(values) > 1:
        if all(values[-1]) and all(values[-2]):
            delta_frames = values[-1][0] - values[-2][0]
            delta_t = delta_frames / fps
            delta_x = values[-1][1] - values[-2][1]
            delta_y = values[-1][2] - values[-2][2]
    
            pixels_per_inch_width = values[-1][3]
            pixels_per_inch_height = values[-1][4]
    
            pixels_per_inch_width_2 = values[-2][3]
            pixels_per_inch_height_2 = values[-2][4]
    
            distance_x = np.linspace(pixels_per_inch_width, pixels_per_inch_width_2, abs(delta_x))
            distance_y = np.linspace(pixels_per_inch_height, pixels_per_inch_height_2, abs(delta_y))
    
            total_distance_x = sum(distance_x)
            total_distance_y = sum(distance_y)
    
            total_displacement = math.sqrt(total_distance_x ** 2 + total_distance_y ** 2)
    
            # # Inches / second (IPS)
            speed_ips = total_displacement / delta_t
    
            '''
            conversion: 1 inch per second (in/s) = 0.056818182 mile per hour (mph)
            '''
            # # Miles / Hour. Average human walks at < 3 mph
            speed_mph = speed_ips * 0.056818182