Search code examples
web-scrapingscrapyscrapyd

Scrapy managing dynamic spiders


I am building a project where I need a web crawler which crawls a list of different webpages. This list can change at any time. How is this best implemented with scrapy? Should I create one spider for all websites or dynamically create spiders?

I have read about scrapyd, and I guess that dynamically creating spiders is the best approach. I would need a hint about how to implement it though.


Solution

  • If parsing logic is same then there are two methods,

    1. For large number of webpages, you can create a list and read that list at the start may b in start_requests method or in constructor and assign that list to start_urls
    2. You can pass you webpage link as a parameter to the spider from command line arguments, ans same in requests_method or in constructor you can access this parameter and assign it to start_urls

    Passing parameters in scrapy

        scrapy crawl spider_name -a start_url=your_url
    

    In scrapyd replace -a with -d