Search code examples
pythonceleryscrapyd

Difference between Celery and Scrapyd


I have built a small scrapy spider using Portia. I have deployed it in the Scrapyd and working fine.

After searching i found that we can use Celery to schedule the spider.

Actually what is the difference between Scrapyd and Celery?

Can any one help me.

Thanks.


Solution

  • Scrapyd focused mainly on deploying scrapy spiders, while Celery is a generic framework to run asynchronous tasks in a distributed and scalable manner.

    You can do one with the other. but scrapy as you know is focusing on scraping the web, However with celery you will define your task.

    scrapy + scrapyd: Scrapyd is build for scrapy, when you will "deploy" a new spider its kinda running scrapy crawl myspider. Scrapyd also provides a webservice to upload and start new spiders and some more features.

    scrapy + celery: The celery task that you will need to implement will do basically the same things that are given in scrapyd. the main advantage in this way IMO is if you will eventually have requirements that scrapyd cant provide, it will be easier to implement them with celery, because in celery you define your own task.

    From Celery:

    Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

    From Scrapyd:

    Scrapyd is an application for deploying and running Scrapy spiders. It enables you to deploy (upload) your projects and control their spiders using a JSON API.