python image image-processing computer-vision python-imaging-library

How to crop an image given proportional coordinates with Python PIL?

I have an image with dimension (1920x1080) with proportional coordinates provided with a description of the detected person region. I want to crop only the detected person from the image using provided proportional coordinates. I looked up PIL crop documentation and tried the following:

Provided in the integration documentation:

x0, y0 The x, y coordinates corresponding to the lower-right corner of the person detection box. They are proportional distances from the upper-left of the image.

x1, y1 The x, y coordinates corresponding to the upper-left corner of the person detection box. They are proportional distances from the upper-left of the image.

Sample integration description provided:

def img_crop(url, box):
    box = {
            'x0': 0.974, 
            'x1': 0.922, 
            'y0': 0.502, 
            'y1': 0.315
    }
    img = Image.open(requests.get(url, stream=True).raw)
    h, w = img.size
    print(img.size)
    return img.crop((box['x0']*h, box['y0']*w, box['x1']*h, box['y1']*w))

This results in the following error

ValueError: Coordinate 'right' is less than 'left'

Solution

But your drawing contradict your own description of what x0,y0,x1,y1 are. It is said (in a picture of text btw; it is preferable to avoid that) that x0,y0 is the lower right corner, and x1,y1 the upper left corner.

Just invert x0,y0 and x1,y1.

Also, note fyi, that coordinates system in PIL (and generally speaking in most image processing system. Since this is how images formats are also done), starts from the upper left corner. Like an English text: pixels are organized from left to right, and from top to bottom.

EDIT: (answer to your comment)

One way would be to really just swap them and replace your .crop line by

    return img.crop((box['x1']*h, box['y1']*w, box['x0']*h, box['y0']*w))

This would work in your code. Nevertheless, there are some other changes that are preferable. First of all, you call width of the image h, and height of the image w. Of course, it is not a problem from a python point of view, but it doesn't help readibility (I surmise that you did so because when images are np.array, such as opencv images, to get w and h, you would h,w,_=img.shape. But PIL .size return w first and h second. And then, you inverted w and h in the crop line to be consistent.

Secondly, it is quite strange to rely on the fact that x0 and y0 are the biggest x and y of the box, and x1, y1 are the smallest. It would be better to do the inversion in the calling code. You did not provide it, reason why I did not try to show correction: correction has to be done in code that is not provided. (You did provide a box, to override what is passed. So in that box you could do the swap as well)

    box = {
            'x1': 0.974, 
            'x0': 0.922, 
            'y1': 0.502, 
            'y0': 0.315
    }

But the safest way, especially since you seem unsure about where all corners are, and taking into account that sometimes, x0 could be smaller than x1, which y0 is bigger than y1, is to compute which one is min, which one is max.

Like this:

from PIL import Image
import matplotlib.pyplot as plt

def img_crop(url, box):
    box = {
            'x0': 0.216,
            'x1': 0.419,
            'y0': 0.237,
            'y1': 0.697
    }
    img = Image.open(requests.get(url, stream=True).raw)
    w, h = img.size
    print(img.size)
    xmin=min(box['x0'], box['x1'])
    xmax=max(box['x0'], box['x1'])
    ymin=min(box['y0'], box['y1'])
    ymax=max(box['y0'], box['y1'])
    return img.crop((xmin*w, ymin*h, xmax*w, ymax*h))

There, no problem. Just pass the two x and the two y in the order x,y,x,y without bothering about which x to send first and which y to send first.

On your picture, with by version of box it gets