Search code examples
pythonscrapyscrapyd

Scrapy extension: spider_closed is not called


I have an extension which attaches to spider_opened and spider_closed. The spider_opened method is correctly called, but the spider_closed method is not. I close the spider by calling the scrapyd cancel endpoint.

class SpiderCtlExtension(object):

   @classmethod 
   def from_crawler(cls, crawler):
       ext = SpiderCtlExtension()

       ext.project_name = crawler.settings.get('BOT_NAME')
       crawler.signals.connect(ext.spider_opened, signal=signals.spider_opened)
       crawler.signals.connect(ext.spider_closed, signal=signals.spider_closed)

       return ext

   def spider_opened(self, spider):
       sql = """UPDATE ctl_crawler
             SET status = 'RUNNING'
             WHERE jobid = '{}'  """.format(os.getenv("SCRAPY_JOB"))
       engine.execute(sql)

   def spider_closed(self,spider,reason):
       sql = """UPDATE ctl_crawler
             SET status = '{}'
             WHERE jobid = '{}'  """.format(reason.upper(),os.getenv("SCRAPY_JOB"))
       engine.execute(sql)

Am I doing something wrong here?


Solution

  • This is a (windows-specific) bug, see my bug report https://github.com/scrapy/scrapyd/issues/83

    The reason is that the way that the cancel method works, no shutdown handlers in the spider process are called.