I am trying to download all images from a website, however I can only get a return of one image per page/item. I am trying to get my spider to download all the images present on the page.
for elem in response.xpath("//img"):
img_url = elem.xpath("@src").extract_first()
l.add_value('image_urls', [img_url])
l.add_value('url', response.url)
l.add_value('project', self.settings.get('BOT_NAME'))
l.add_value('spider', self.name)
l.add_value('server', socket.gethostname())
l.add_value('date', datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
return l.load_item()
When I change .extract_first() to .extract() the spider stops to run, however I cannot work out how to pass each image url (there can be dozens on a page) to be its own item + download.
Any help would be greatly appreciated!
You are only getting one image because return
exits your method immediately. Use yield
instead of return
for the desired behavior.
See this other answer for details.