Search code examples
svgweb-scrapingscrapyscrapy-pipeline

Saving (.svg) images using Scrapy


Im using Scrapy and I want to save some of the .svg images from the webpage locally on my computer. The urls for these images have the structure '__.com/svg/4/8/3/1425.svg' (and is a full working url, https included).

Ive defined the item in my items.py file:

class ImageItem(scrapy.Item):
image_urls = scrapy.Field()
images = scrapy.Field()

Ive added the following to my settings:

ITEM_PIPELINES = {
'scrapy.pipelines.images.ImagesPipeline': 1,
}

IMAGES_STORE = '../Data/Silks'
MEDIA_ALLOW_REDIRECTS = True

In the main parse function im calling:

imageItem = ImageItem()
imageItem['image_urls'] = [url]

yield imageItem

But it doesn't save the images. Ive followed the documentation and tried numerous things but keep getting the following error:

StopIteration: <200 https://www.________.com/svg/4/8/3/1425.svg>

During handling of the above exception, another exception occurred:
......
......
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x1139233b0>

Am I missing something? Can anyone help? I am fully stumped.


Solution

  • Gallaecio was right! Scrapy was having an issue with the .svg file type. Changed the imagePipeline to the filePipeline and it works!

    For anyone stuck the documentation is here