I'm trying to deploy my scrapy crawlers
, but the problem is that I have a yaml file
that I'm trying to load from inside the spider
,
this works when the spider is loaded from the shell: scrapy crawl <spider-name>
.
But when the spider is deployed inside scrapyd
, the path to the yaml file must be absolute
.
Is there a way to use a relative path
for the yaml file
, even when spiders are deployed with scrapyd
?
P.S:
The spider
is deployed on scrapyd
with:
scrapyd-deploy default -p <project-name>
curl http://127.0.0.1:6800/schedule.json -d project=<project-name> -d spider=<spider-name>
And the yaml
file is loaded with:
with open('../categories/categories.yaml', 'r') as f:
pass
I have found the answer here: scrapyd and file (pkgutil.get_data)
Briefly, you have to add register paths to these static files
in setup.py
.