Search code examples
pythonscrapyyamlscrapyd

Scrapy - Load a yaml file with a relative path inside the spider


I'm trying to deploy my scrapy crawlers, but the problem is that I have a yaml file that I'm trying to load from inside the spider, this works when the spider is loaded from the shell: scrapy crawl <spider-name>. But when the spider is deployed inside scrapyd, the path to the yaml file must be absolute.

Is there a way to use a relative path for the yaml file, even when spiders are deployed with scrapyd?

P.S:
The spider is deployed on scrapyd with:

scrapyd-deploy default -p <project-name>
curl http://127.0.0.1:6800/schedule.json -d project=<project-name> -d spider=<spider-name>

And the yaml file is loaded with:

with open('../categories/categories.yaml', 'r') as f:
    pass

Solution

  • I have found the answer here: scrapyd and file (pkgutil.get_data)

    Briefly, you have to add register paths to these static files in setup.py.