Search code examples
pythondjangoscrapypython-3.8scrapyd

Can't add a .egg file to scrapyd addversion.json


The problem I had is I can't upload my .egg file to scrapyd using

curl http://127.0.0.1:6800/addversion.json -F project=scraper_app -F version=r1 egg=@scraper_app-0.0.1-py3.8.egg

its returning an error message like this

{"node_name": "Workspace", "status": "error", "message": "b'egg'"}

So I'm using Django and Scrapy in the same project, and I had this folder structure

my_app/
-- apps/  # django apps folder
   -- crawler/ 
      -- __init__.py
      -- admin.py
      -- apps.py
      -- etc..
   -- pages/
      -- __init__.py
      -- admin.py
      -- apps.py
      -- etc..
-- my_app/  # django project folder
   -- __init__.py
   -- asgi.py
   -- settings.py
   -- etc..
-- scraper_app/ # scrapy dir
   -- scraper_app/ # scrapy project folder
      -- spiders/
         -- abc_spider.py
      -- __init__.py
      -- middlewares.py
      -- pipelines.py
      -- settings.py
      -- etc..
   -- scrapy.cfg
-- manage.py
-- scrapyd.conf
-- setup.py  # setuptools for creating the egg file
-- etc..

and here is what my setup.py looks like

from setuptools import setup, find_packages

setup(
    name="scraper_app",
    version="1.0.0",
    author="Khrisna Gunanasurya",
    author_email="[email protected]",
    description="Create egg file from 'scraper_app'",
    packages=find_packages(where=['scraper_app'])
)

my scrapyd.conf file

[scrapyd]
eggs_dir    = eggs
logs_dir    = logs
logs_to_keep = 5
dbs_dir     = dbs
max_proc    = 0
max_proc_per_cpu = 4
http_port   = 6800
debug       = off
runner      = scrapyd.runner
application = scrapyd.app.application

and my scrapy.cfg content

[settings]
default = scraper_app.settings

[deploy]
url = http://127.0.0.1:6800/
project = scraper_app

So what I want is add an .egg file to my scrapyd/addversion.json and here is my step by step to achieve what I want:

  1. run py setup.py bdist_egg
  2. .egg file being generated in dist/ folder and its called scraper_app-0.0.1-py3.8.egg
  3. cd to dist/ folder
  4. run curl http://127.0.0.1:6800/addversion.json -F project=scraper_app -F version=r1 -F egg=@scraper_app-0.0.1-py3.8.egg

and then what I got is an error message, if I tried to run the curl from the root dirs, and run something like this curl http://127.0.0.1:6800/addversion.json -F project=scraper_app -F version=r1 -F egg=@dist\scraper_app-0.0.1-py3.8.egg (im using windows) it'll returning this error

curl: (6) Could not resolve host: dist\scraper_app-0.0.1-py3.8.egg

I already tried to googled it but I can't find how to solved this or what wrong step I make here, and I already tried to create the .egg file from the scraper_app dir directly, so just create an egg file from the scraper_app project folder, but its not working as well.

Can someone tell me whats wrong with my project? or what I do wrong in here?

thank you


Solution

  • after I googled it more, and tried the scrapyd-client but there are lots of problem with windows, it doesnt easy to use the scrapyd-deploy, but I found a video on youtube that show me what is the correct way to install the scrapyd-client.

    so here is the correct way to install it.

    Make sure you inside a virtualenv, and then install the scrapyd-client with pip install git+https://github.com/scrapy/scrapyd.git. So it doesnt show any error or any difficulties to install it

    and then you can just run scrapyd-deploy on the scrapy project folder.