Search code examples
javascriptweb-scrapingapifycrawlee

crawlee - How to add the same URL back to the requestQueue


How do i enqueue the same URL that i am currently handling the request for? I have this code and want to scrape the same URL again (possibly with a delay), i added enviroment variables that cached results will be deleted, according to this answer.

import { RequestQueue, CheerioCrawler, Configuration } from "crawlee";

const config = Configuration.getGlobalConfig();
config.set('persistStorage', false);
config.set('purgeOnStart', false);

const requestQueue = await RequestQueue.open();
await requestQueue.addRequest({ url: "https://www.google.com/" });

const crawler = new CheerioCrawler({
    requestQueue,
    async requestHandler({ $, request }) {
        console.log("Do something with scraped data...");
        await crawler.addRequests([{url: "https://www.google.com/"}]);
    }
})

await crawler.run();

Solution

  • I found a solution: Adding a unique key to the Request Dictionary, for example an counter that is incremented every time before we queue a new request, solves this problem.

    {url: "https://www.google.com/", uniqueKey: counter.toString()}