Search code examples
pythonscrapymiddleware

How do I create middleware for parse and parse_item using Scrapy?


I am using Scrapy and would like to be able to check my database for a should_continue flag and raise a CloseSpider exception if it's false. However, according to the documentation here: http://doc.scrapy.org/en/latest/topics/exceptions.html, CloseSpider can only be called from parse or parse_item.

I could add a function in each parse and parse_item for each spider, but that goes against DRY principals. Can I somehow create a parse and parse_item middleware that is always called before those functions are called?

I couldn't get it to trigger using DOWNLOADER_MIDDLEWARE or SPIDER_MIDDLEWARE, whats the correct way to do this?


Solution

  • The only thing Scrapy does when CloseSpider is raised is call the close_spider() method of the execution engine: https://github.com/scrapy/scrapy/blob/master/scrapy/core/scraper.py#L152-L153

    You can just call that method yourself to achieve the same result.

    This is also what the CloseSpider extension does: https://github.com/scrapy/scrapy/blob/master/scrapy/extensions/closespider.py