Search code examples
pythonpython-3.xfor-loopscrapyyield

Arranging properly the results of two for loops


So as you'll see from the code below, I'm scraping some info with Scrapy. Everything works fine, I'm just not happy the way the scraped data is stored. With the current code, I get results as a column of 'X' and a column of 'Y' side by side (which is fine), but the results for 'U' show up as a row as it is run from a 2nd loop. So what I would like, is to have the scraped data in three columns side by side: X / Y / U. Can anyone help with this? Thanks in advance!

def parse(self, response):
    U = []
    for l in response.css('div.property-info-wrapper'):
        yield {
            'X': l.css('span.info-price::text').extract_first(),
            'Y': l.css('li::text').extract_first(),
        }

    for i in response.selector.xpath('//div[@class="property-info-location ellipsis-element-control"]/text()').extract():
        U.append(i)
    yield {'U':U}

Solution

  • You can use itertools.zip_longest to zip both results together and yield them based on their Truth value *.

    from itertools import zip_longest
    
    def parse(self, response):
        locations = response.selector.xpath('//div[@class="property-info-location ellipsis-element-control"]/text()').extract()
        css = response.css('div.property-info-wrapper')
    
        for loc, c in zip_longest(css, locations):
            if loc:
                yield {
                    'X': loc.css('span.info-price::text').extract_first(),
                    'Y': loc.css('li::text').extract_first(),
                }
            if c:
                yield {'U': c}  # since spider needs to return dict
    

    * itertools.zip_longest(*iterables, fillvalue=None): Make an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with fillvalue. Iteration continues until the longest iterable is exhausted.