Search code examples
pythonscrapytwisted

Scrapy middleware to replace single request with multiple requests


I want a middleware that will take a single Request and transform it into a generator of two different requests. As far as I can tell, the downloader middleware process_request() method can only return a single Request, not a generator of them. Is there a nice way to split an arbitrary request into multiple requests?

It seems that spider middleware process_start_requests actually happens after the start_requests Requests are sent through the downloader. For example, if I set start_urls = ['https://localhost/'] and

def process_start_requests(self, start_requests, spider):
   yield Request('https://stackoverflow.com')

it will fail with ConnectionRefusedError, having tried and failed the localhost request.


Solution

  • I don't know what would be the logic behind transforming a request (before being sent) into multiple requests, but you can still generate several requests (or even items) from a middleware, with this:

    def process_request(self, request, spider):
        for a in range(10):
            spider.crawler.engine.crawl(
                Request(url='myurl', callback=callback_method), 
                spider)