Search code examples
scrapyscrapydscrapyd-deploy

Scrapyd-Deploy: SPIDER_MODULES not found


I am trying to deploy a scrapy 2.1.0 project with scrapy-deploy 1.2 and get this error:

scrapyd-deploy example
/Library/Frameworks/Python.framework/Versions/3.8/bin/scrapyd-deploy:23: ScrapyDeprecationWarning: Module `scrapy.utils.http` is deprecated, Please import from `w3lib.http` instead.
  from scrapy.utils.http import basic_auth_header
fatal: No names found, cannot describe anything.
Packing version r1-master
Deploying to project "crawler" in http://myip:6843/addversion.json
Server response (200):
{"node_name": "spider1", "status": "error", "message": "/usr/local/lib/python3.8/dist-packages/scrapy/utils/project.py:90: ScrapyDeprecationWarning: Use of environment variables prefixed with SCRAPY_ to override settings is deprecated. The following environment variables are currently defined: EGG_VERSION\n  warnings.warn(\nTraceback (most recent call last):\n  File \"/usr/lib/python3.8/runpy.py\", line 193, in _run_module_as_main\n    return _run_code(code, main_globals, None,\n  File \"/usr/lib/python3.8/runpy.py\", line 86, in _run_code\n    exec(code, run_globals)\n  File \"/usr/local/lib/python3.8/dist-packages/scrapyd/runner.py\", line 40, in <module>\n    main()\n  File \"/usr/local/lib/python3.8/dist-packages/scrapyd/runner.py\", line 37, in main\n    execute()\n  File \"/usr/local/lib/python3.8/dist-packages/scrapy/cmdline.py\", line 142, in execute\n    cmd.crawler_process = CrawlerProcess(settings)\n  File \"/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py\", line 280, in __init__\n    super(CrawlerProcess, self).__init__(settings)\n  File \"/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py\", line 152, in __init__\n    self.spider_loader = self._get_spider_loader(settings)\n  File \"/usr/local/lib/python3.8/dist-packages/scrapy/crawler.py\", line 146, in _get_spider_loader\n    return loader_cls.from_settings(settings.frozencopy())\n  File \"/usr/local/lib/python3.8/dist-packages/scrapy/spiderloader.py\", line 60, in from_settings\n    return cls(settings)\n  File \"/usr/local/lib/python3.8/dist-packages/scrapy/spiderloader.py\", line 24, in __init__\n    self._load_all_spiders()\n  File \"/usr/local/lib/python3.8/dist-packages/scrapy/spiderloader.py\", line 46, in _load_all_spiders\n    for module in walk_modules(name):\n  File \"/usr/local/lib/python3.8/dist-packages/scrapy/utils/misc.py\", line 69, in walk_modules\n    mod = import_module(path)\n  File \"/usr/lib/python3.8/importlib/__init__.py\", line 127, in import_module\n    return _bootstrap._gcd_import(name[level:], package, level)\n  File \"<frozen importlib._bootstrap>\", line 1014, in _gcd_import\n  File \"<frozen importlib._bootstrap>\", line 991, in _find_and_load\n  File \"<frozen importlib._bootstrap>\", line 973, in _find_and_load_unlocked\nModuleNotFoundError: No module named 'crawler.spiders_prod'\n"}

crawler.spiders_prod is the first module defined in SPIDER_MODULES

Part of crawler.settings.py:

SPIDER_MODULES = ['crawler.spiders_prod', 'crawler.spiders_dev']
NEWSPIDER_MODULE = 'crawler.spiders_dev'

The crawler works localy, but using deploy it will fail to use whatever I call the folder where my spiders live in.

scrapyd-deploy setup.py:

# Automatically created by: scrapyd-deploy

from setuptools import setup, find_packages

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = crawler.settings']},
)

scrapy.cfg:

[deploy:example]
url = http://myip:6843/
username = test
password = whatever.
project = crawler
version = GIT

Is this possibly a bug or am I missing something?


Solution

  • Modules have to be initialised within scrapy. This happens through simply placing the following file into each folder defined as a module:

    __init__.py
    

    This has solved my described problem.

    Learning:

    If you want to split your spiders into folders, it is not enough to simple create a folder and specify this folder as a module within the settings file, but you also need to place this file into the new folder. Funny engough the crawler works, without the file just deployment to scrapyd fails.