So as you'll see from the code below, I'm scraping some info with Scrapy. Everything works fine, I'm just not happy the way the scraped data is stored. With the current code, I get results as a column of 'X' and a column of 'Y' side by side (which is fine), but the results for 'U' show up as a row as it is run from a 2nd loop. So what I would like, is to have the scraped data in three columns side by side: X / Y / U. Can anyone help with this? Thanks in advance!
def parse(self, response):
U = []
for l in response.css('div.property-info-wrapper'):
yield {
'X': l.css('span.info-price::text').extract_first(),
'Y': l.css('li::text').extract_first(),
}
for i in response.selector.xpath('//div[@class="property-info-location ellipsis-element-control"]/text()').extract():
U.append(i)
yield {'U':U}
You can use itertools.zip_longest
to zip both results together and yield them based on their Truth value *.
from itertools import zip_longest
def parse(self, response):
locations = response.selector.xpath('//div[@class="property-info-location ellipsis-element-control"]/text()').extract()
css = response.css('div.property-info-wrapper')
for loc, c in zip_longest(css, locations):
if loc:
yield {
'X': loc.css('span.info-price::text').extract_first(),
'Y': loc.css('li::text').extract_first(),
}
if c:
yield {'U': c} # since spider needs to return dict
* itertools.zip_longest(*iterables, fillvalue=None): Make an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted.