Search code examples
pythonscrapy

Scrapy/Python: run logic after yielded requests are finished


What I do:

def parse(self, response):

    products_urls = response.css('.product-item a::attr(href)').extract()

    for product_url in product_urls:
        yield Request(product_url, callback=self.parse_product)

    print( "Continue doing stuff...." )


def parse_product(self, response):
    title = response.css('h1::text').extract_first()
    print( title )
}

In this example, the code will first output Continue doing stuff.. and after that it will print product titles. I would like it to run otherwise, first do requests and print titles, and only then print Continue doing stuff..

UPDATE: @Georgiy in comments asked if I require previously scraped product data.

Answer is yes, this is simplified example. After data is fetched I want to manipulate that data.


Solution

  • You can move the logic to the parse_product function. For example:

        def parse(self, response):
            products_urls = response.css('.product-item a::attr(href)').extract()
    
            self.count = len(products_urls)
            if self.count == 0:
                self.onEnd()
            else:
                for product_url in product_urls:
                    yield Request(product_url, callback=self.parse_product)
    
        def onEnd(self):
            print( "Continue doing stuff...." )
    
    
        def parse_product(self, response):
            title = response.css('h1::text').extract_first()
            print( title )
            self.count -= 1
            if (self.count == 0):
                self.onEnd()