Search code examples
pythonweb-scrapingscrapyextracthtml-content-extraction

crawl pictures from web site with Scrapy


I want to crawl the image of each bottle of wine from web site of vinnicolas and save it in an svc file.

unfortunately, I got some errors :

Spider : https://gist.github.com/anonymous/6424305

pipelines.py. : https://gist.github.com/nahali/6434932

settings.py :


Solution

  • Your parse_wine_page does not set the "image_urls" field value in the items, so the middleware will not download any images

    import urlparse
    ...
    
        def parse_wine_page(self, reponse):
            ...
            hxs = HtmlXPathSelector(response)
            content = hxs.select('//*[@id="glo_right"]')
            for res in content:
                ...
                #item ["Image"]= map(unicode.strip, res.select('//div[@class="pro_detail_tit"]//div[@class="pro_titre"]/h1/text()').extract())
                item['image_urls'] = map(lambda src: urlparse.urljoin(response.url, src), res.select('./div[@class="pro_col_left"]/img/@src').extract())
                items.append(item)
            return items
    

    Also make sure your Projetvinnicolas3Item class has "images" and "image_urls" Fields()