I'm using Images Pipeline from Scrapy and for some images I'm getting this error:
[scrapy.pipelines.files] ERROR: File (unknown-error): Error processing file from <GET https://www.example.com/folder-name/image.jpg> referred in <None>
Traceback (most recent call last):
File "c:\users\user\anaconda2\lib\site-packages\scrapy\pipelines\files.py", line 401, in media_downloaded
checksum = self.file_downloaded(response, request, info)
File "c:\users\user\anaconda2\lib\site-packages\scrapy\pipelines\images.py", line 101, in file_downloaded
return self.image_downloaded(response, request, info)
File "c:\users\user\anaconda2\lib\site-packages\scrapy\pipelines\images.py", line 105, in image_downloaded
for path, image, buf in self.get_images(response, request, info):
File "c:\users\user\anaconda2\lib\site-packages\scrapy\pipelines\images.py", line 125, in get_images
image, buf = self.convert_image(orig_image)
File "c:\users\user\anaconda2\lib\site-packages\scrapy\pipelines\images.py", line 151, in convert_image
image.save(buf, 'JPEG')
File "c:\users\user\anaconda2\lib\site-packages\PIL\Image.py", line 1916, in save
self.load()
File "c:\users\user\anaconda2\lib\site-packages\PIL\ImageFile.py", line 254, in load
raise_ioerror(err_code)
File "c:\users\user\anaconda2\lib\site-packages\PIL\ImageFile.py", line 59, in raise_ioerror
raise IOError(message + " when reading image file")
IOError: broken data stream when reading image file
The images are available on the server (without redirects) and I don't find any difference between the images that work and the ones which doesn't. Any idea of what I'm missing?
This seems to be a known issue. Upgrading Pillow dependency (pip install Pillow --upgrade
) fixed it.