Search code examples
azurescrapyscheduled-tasksscrapyd

Scrapyd vs Windows Task Scheduler


I want to run a small set of Scrapy spiders on an Azure virtual machine. I'm looking for an automation solution. For the time being it seems like Windows Task Scheduler will do the job for running 3-5 spiders on one vm instance. The only concern I would have is whether I can make it work to run these few spiders in parallel?

If already Task Scheduler can run spiders in parallel, what would be a more long term advantages of using Scrapyd for this e.g. if there is long term say 100 spiders in scope? As an alternative perhaps a few virtual machines with a task scheduler on each would do the job as well. I'm trying to stay away from Linux due to other developments on Windows plus I've seen some concerns for using Scrapyd with Windows.


Solution

  • I've encountered issues when trying to setup Scrapyd on Windows, specifically Scrapyd-Client. I contacted the developers running it and they mentioned Scrapyd server for Windows is not part of the project. Interesting, because I actually managed to run it without specific issues, there is a pip install, I just did not manage to get started properly with Scrapyd-Client. Some people on SOF claim to have been able to run Scrapyd on Windows, but Linux is strongly on the forefront for this tool so I would not go pushing against the current.

    Scrapy is a great tool, but making Scrapyd work with Windows at least for me was not the experience I would expect, so for time being I will use a basic Windows Task Scheduler option.