Search code examples
pythonimage-processingpython-imaging-librarygevent

Pillow python : Improve script performance


I have a simple script that gets the image size from a list of images URL but it's way too slow when the list is too big (ex: 120 URLs, it can take 10 seconds to run)

def get_image_size(url):
    data = requests.get(url).content
    try:
        im = Image.open(BytesIO(data))
        size = im.size
    except:
        size = False
    return size

list_images = ['https://example.com/img.png', ...]
for img in list_images:
    get_image_size(img)

I already tried Gevent which can make me save 50% of the processing time but it's not enough. I'd like to know if there is another option to make this script run faster?

The final goal is to get the 5 biggest images of the data set.


Solution

  • You could make use of grequests (requests and gevent) and instead of using Pillow to get the image size, you can identify the image size from the HTTP headers:

    enter image description here

    Usually performance depends on the network connection/server speed and image size:

    import grequests
    
    
    def downloadImages(images):
        result = {}
        rs = (grequests.get(t) for t in images)
        downloads = grequests.map(rs, size=len(images))
    
        for download in downloads:
            _status = 200 == download.status_code
            _url = download.url
    
            if _status:
                for k, v in download.headers.items():
                    if k.lower() == 'content-length':
                        result[_url] = v
                        continue
            else:
                result[_url] = -1
        return result
    
    
    if __name__ == '__main__':
        urls = [
            'https://b.tile.openstreetmap.org/12/2075/1409.png',
            'https://b.tile.openstreetmap.org/12/2075/1410.png',
            'https://b.tile.openstreetmap.org/12/2075/1411.png',
            'https://b.tile.openstreetmap.org/12/2075/1412.png'
        ]
    
        sizes = downloadImages(urls)
        pprint.pprint(sizes)
    

    Returns:

    {'https://b.tile.openstreetmap.org/12/2075/1409.png': '40472',
     'https://b.tile.openstreetmap.org/12/2075/1410.png': '38267',
     'https://b.tile.openstreetmap.org/12/2075/1411.png': '36338',
     'https://b.tile.openstreetmap.org/12/2075/1412.png': '30467'}