Search code examples
web-scrapingweb-crawlerapify

Can an Apify project contain several crawlers?


I was searching the documentation for it but wasn't able to find any related article.

I want to know if I can have several crawlers defined in a Apify project just like you can have several Spiders on Scrapy or if I have to create a new project for each new website that I like to crawl.

I would appreciate any response, thank you in advance!


Solution

  • Yes, you can create as many crawler instances you need/want.

    It's usually a good thing to separate things like sitemap crawling, using it's own CheerioCrawler/BasicCrawler instances with specific settings and an specific queue, then the full scraper using the desired crawler, like PuppeteerCrawler, also using it's own queue if needed.

    You can choose to run them in parallel with

    await Promise.all([
       crawler1.run(), 
       crawler2.run(),
    ]); 
    

    or one at a time, using

    await crawler1.run();
    await crawler2.run();
    

    the caveat when using Promise.all is that if they are reading/writing to the same key-value store, you might have some racing conditions. If they don't share any state, you should be good to go.