I have following scenario:
Currently I have all 30 urls in a request queue (through the Apify web-interface) and I'm trying to see when they are all finished.
But obviously they all run async so that data is never reliable
const queue = await Apify.openRequestQueue();
let pendingRequestCount = await queue.getInfo();
The reason why I need that last URL to be separate are two-fold:
Edit: Tried this based on answer from @Lukáš Křivka. handledRequestCount in the while loop reaches a max of 2, never 4 ... and Puppeteer just ends normally. I've put the "return" inside the while loop because otherwise requests never finish (of course).
In my current test setup I have 4 urls to be scraped (in the Start URLS input fields of Puppeteer Scraper (on Apify.com) and this code :
let title = "";
const queue = await Apify.openRequestQueue();
let {handledRequestCount} = await queue.getInfo();
while (handledRequestCount < 4){
await new Promise((resolve) => setTimeout(resolve, 2000)) // wait for 2 secs
handledRequestCount = await queue.getInfo().then((info) => info.handledRequestCount);
console.log(`Curently handled here: ${handledRequestCount} --- waiting`) // this goes max to '2'
title = await page.evaluate(()=>{ return $('h1').text()});
return {title};
}
log.info("Here I want to add another URL to the queue where I can do ajax stuff to save results from above runs to firebase db");
title = await page.evaluate(()=>{ return $('h1').text()});
return {title};
Because I was not able to get consistent results with the {handledRequestCount} from getInfo() (see my edit in my original question), I went another route.
I'm basically keeping a record of which URL's have already been scraped via the key/value store.
urls = [
{done:false, label:"vietnam", url:"https://en.wikipedia.org/wiki/Vietnam"},
{done:false , label:"cambodia", url:"https://en.wikipedia.org/wiki/Cambodia"}
]
// Loop over the array and add them to the Queue
for (let i=0; i<urls.length; i++) {
await queue.addRequest(new Apify.Request({ url: urls[i].url }));
}
// Push the array to the key/value store with key 'URLS'
await Apify.setValue('URLS', urls);
Now every time I've processed an url I set its "done" value to true. When they are all true I'm pushing another (final) url into the queue:
await queue.addRequest(new Apify.Request({ url: "http://www.placekitten.com" }));