Search code examples
pythonscrapyscrapy-pipeline

IOError on Scrapy Images Pipeline


I'm using Images Pipeline from Scrapy and for some images I'm getting this error:

[scrapy.pipelines.files] ERROR: File (unknown-error): Error processing file from <GET https://www.example.com/folder-name/image.jpg> referred in <None>
Traceback (most recent call last):
  File "c:\users\user\anaconda2\lib\site-packages\scrapy\pipelines\files.py", line 401, in media_downloaded
    checksum = self.file_downloaded(response, request, info)
  File "c:\users\user\anaconda2\lib\site-packages\scrapy\pipelines\images.py", line 101, in file_downloaded
    return self.image_downloaded(response, request, info)
  File "c:\users\user\anaconda2\lib\site-packages\scrapy\pipelines\images.py", line 105, in image_downloaded
    for path, image, buf in self.get_images(response, request, info):
  File "c:\users\user\anaconda2\lib\site-packages\scrapy\pipelines\images.py", line 125, in get_images
    image, buf = self.convert_image(orig_image)
  File "c:\users\user\anaconda2\lib\site-packages\scrapy\pipelines\images.py", line 151, in convert_image
    image.save(buf, 'JPEG')
  File "c:\users\user\anaconda2\lib\site-packages\PIL\Image.py", line 1916, in save
    self.load()
  File "c:\users\user\anaconda2\lib\site-packages\PIL\ImageFile.py", line 254, in load
    raise_ioerror(err_code)
  File "c:\users\user\anaconda2\lib\site-packages\PIL\ImageFile.py", line 59, in raise_ioerror
    raise IOError(message + " when reading image file")
IOError: broken data stream when reading image file

The images are available on the server (without redirects) and I don't find any difference between the images that work and the ones which doesn't. Any idea of what I'm missing?


Solution

  • This seems to be a known issue. Upgrading Pillow dependency (pip install Pillow --upgrade) fixed it.