Search code examples
pythonopencvimage-processingdata-sciencescikit-image

How to crop car number on this pic given its relative bbox coordinates?


I have this pic: pic_to_be_cropped

I have the following relative coordinates:

[[0.6625, 0.6035714285714285], [0.7224999999999999, 0.6035714285714285], [0.7224999999999999, 0.6571428571428571], [0.6625, 0.6571428571428571], [0.6625, 0.6035714285714285]]

(however I don't understand, why are here 5 values instead of usual 4 and what they mean)

My attempt with scikit-image that shows whole pic instead of cropping:

import numpy as np
from skimage import io, draw

img = io.imread(pic)

vals = [[0.6625, 0.6035714285714285], [0.7224999999999999, 0.6035714285714285], [0.7224999999999999, 0.6571428571428571], [0.6625, 0.6571428571428571], [0.6625, 0.6035714285714285]]

vertices = np.asarray(test_vals)

rows, cols = draw.polygon(vertices[:, 0], vertices[:, 1])

crop = img.copy()

crop[:, :, -1] = 0
crop[rows, cols, -1] = 255

io.imshow(crop)
io.show()

# shows whole pic instead of cropping

My attempt with opencv gives errors because coordinates are in float format:

import cv2 as cv


vals = [[0.6625, 0.6035714285714285], [0.7224999999999999, 0.6035714285714285], [0.7224999999999999, 0.6571428571428571], [0.6625, 0.6571428571428571], [0.6625, 0.6035714285714285]]

x = vals[0][0]
y = vals[0][1]
width = vals[1][0] - x
height = vals[2][1] - y

img = cv.imread(pic)

crop_img = img[y:y+height, x:x+width]
cv.imshow("cropped", crop_img)
cv.waitKey(0)

#  TypeError: slice indices must be integers or None or have an __index__ method

How to crop car number on this pic given its relative bbox coordinates?

I am not limited to any framework, so if you think that TF or anything else might help - please suggest.


Solution

  • Inspection of

    vals = [[0.6625, 0.6035714285714285], [0.7224999999999999, 0.6035714285714285], [0.7224999999999999, 0.6571428571428571], [0.6625, 0.6571428571428571], [0.6625, 0.6035714285714285]]
    

    shows that the first and the last entry in the list are identical. In image processing, the position (0,0) is the top left corner. Looking at the values in the list, you can assume that the coordinates are given as follows:

    [top_left, bottom_left, bottom_right, top_right, top_left]
    

    The fact that all numbers are between zero and 1 suggests that these are relative coordinates. To rescale back to image dimensions, they need to be multiplied by height and width, respectively:

    # dummy img sizes:
    image_height = 480
    image_width = 640
    
    # rescale to img dimensions, and convert to int, to allow slicing:
    bbox_coordinates = [[int(a[0]*image_height), int(a[1]* image_width)] for a in vals]
    

    now, you can use array slicing on the image to crop:

    top_left = bbox_coordinates[0]
    bottom_right = boox_coordinates[2]
    
    bbox = img[top_left[0]:bottom_right[0], top_left[1]:bottom_right[1]]