I have an extension which attaches to spider_opened and spider_closed. The spider_opened method is correctly called, but the spider_closed method is not. I close the spider by calling the scrapyd cancel endpoint.
class SpiderCtlExtension(object):
@classmethod
def from_crawler(cls, crawler):
ext = SpiderCtlExtension()
ext.project_name = crawler.settings.get('BOT_NAME')
crawler.signals.connect(ext.spider_opened, signal=signals.spider_opened)
crawler.signals.connect(ext.spider_closed, signal=signals.spider_closed)
return ext
def spider_opened(self, spider):
sql = """UPDATE ctl_crawler
SET status = 'RUNNING'
WHERE jobid = '{}' """.format(os.getenv("SCRAPY_JOB"))
engine.execute(sql)
def spider_closed(self,spider,reason):
sql = """UPDATE ctl_crawler
SET status = '{}'
WHERE jobid = '{}' """.format(reason.upper(),os.getenv("SCRAPY_JOB"))
engine.execute(sql)
Am I doing something wrong here?
This is a (windows-specific) bug, see my bug report https://github.com/scrapy/scrapyd/issues/83
The reason is that the way that the cancel method works, no shutdown handlers in the spider process are called.