I have a simple script that gets the image size from a list of images URL but it's way too slow when the list is too big (ex: 120 URLs, it can take 10 seconds to run)
def get_image_size(url):
data = requests.get(url).content
try:
im = Image.open(BytesIO(data))
size = im.size
except:
size = False
return size
list_images = ['https://example.com/img.png', ...]
for img in list_images:
get_image_size(img)
I already tried Gevent which can make me save 50% of the processing time but it's not enough. I'd like to know if there is another option to make this script run faster?
The final goal is to get the 5 biggest images of the data set.
You could make use of grequests (requests and gevent) and instead of using Pillow to get the image size, you can identify the image size from the HTTP headers:
Usually performance depends on the network connection/server speed and image size:
import grequests
def downloadImages(images):
result = {}
rs = (grequests.get(t) for t in images)
downloads = grequests.map(rs, size=len(images))
for download in downloads:
_status = 200 == download.status_code
_url = download.url
if _status:
for k, v in download.headers.items():
if k.lower() == 'content-length':
result[_url] = v
continue
else:
result[_url] = -1
return result
if __name__ == '__main__':
urls = [
'https://b.tile.openstreetmap.org/12/2075/1409.png',
'https://b.tile.openstreetmap.org/12/2075/1410.png',
'https://b.tile.openstreetmap.org/12/2075/1411.png',
'https://b.tile.openstreetmap.org/12/2075/1412.png'
]
sizes = downloadImages(urls)
pprint.pprint(sizes)
Returns:
{'https://b.tile.openstreetmap.org/12/2075/1409.png': '40472',
'https://b.tile.openstreetmap.org/12/2075/1410.png': '38267',
'https://b.tile.openstreetmap.org/12/2075/1411.png': '36338',
'https://b.tile.openstreetmap.org/12/2075/1412.png': '30467'}