Search code examples
scrapyscrapy-pipeline

Change CSV result of Image Pipeline on Scrapy


I'm using the default Scrapy Images Pipeline and I'm exporting my data as CSV. The last field is auto-filled with an array containing the original url, local path and checksum. However, I need to have just a string containing the local path. How can I do that?


Solution

  • I guess you are getting results like

    item["images"] = [
      {'checksum': '2b00042f7481c7b056c4b410d28f33cf',
       'path': 'full/0a79c461a4062ac383dc4fade7bc09f1384a3910.jpg',
       'url': 'http://www.example.com/files/product1.pdf'}]
    

    Inside your process_item() method of Pipeline, do this

    def process_item(self, item, spider):
    
        images = item["images"]
        del item["images"]
    
        item['path'] = []
        for k,v in images.iteritems():
            item['path'].extend([v['path']])