Search code examples

Why is HttpCacheMiddleware disabled in scrapyd?

Why does HttpCachedMiddleware need scrapy.cfg and how do I work around this issue?

I use scrapyd-deploy to build the egg, and deploy project to scrapyd.

When the job is run, I see from the log output that the HttpCacheMiddleware is disabled because scrapy.cfg is not found.

2014-06-08 18:55:51-0700 [scrapy] WARNING: Disabled HttpCacheMiddleware: Unable to find scrapy.cfg file to infer project data dir

I check the egg file and scrapy.cfg is indeed not there because the egg file only consists of the files in the project directory. I could be wrong, but I think the egg is built correctly.

 |- project/
 |      |-
 |      |-
 |      |- spiders/
 |            |- ...
 |- scrapy.cfg

Digging into the code more, I think one of the three if-condition is failing somehow in MiddlewareManager.

            mwcls = load_object(clspath)
            if crawler and hasattr(mwcls, 'from_crawler'):
                mw = mwcls.from_crawler(crawler)
            elif hasattr(mwcls, 'from_settings'):
                mw = mwcls.from_settings(settings)
                mw = mwcls()
        except NotConfigured, e:
            if e.args:
                clsname = clspath.split('.')[-1]
                log.msg(format="Disabled %(clsname)s: %(eargs)s",
                        level=log.WARNING, clsname=clsname, eargs=e.args[0])


  • Place an empty scrapy.cfg under your working directory.

    As the source code shows, project_data_dir will try to find the closest scrapy.cfg and use it to infer the project data dir.