here is the scenario, I'm using the cheerio scraper to scraper a website containing real estate announces.
Each announce has the link to the next announce so before scrapint the current page I add the next page in the request queue. What it happens always at certain and a random point is that the scraper stops without any reason, even if in the queue there is the next page to scrape (I add the image).
Why does this happens since there is still a pending request in the queue? Many thanks
Here is the message I get:
2021-02-28T10:52:35.439Z INFO CheerioCrawler: All the requests from request list and/or request queue have been processed, the crawler will shut down.
2021-02-28T10:52:35.672Z INFO CheerioCrawler: Final request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":963,"requestsFinishedPerMinute":50,"requestsFailedPerMinute":0,"requestTotalDurationMillis":22143,"requestsTotal":23,"crawlerRuntimeMillis":27584,"requestsFinished":23,"requestsFailed":0,"retryHistogram":[23]}
2021-02-28T10:52:35.679Z INFO Cheerio Scraper finished.
Here the request queue:
Here the code
async function pageFunction(context) {
const { $, request, log } = context;
// The "$" property contains the Cheerio object which is useful
// for querying DOM elements and extracting data from them.
const pageTitle = $('title').first().text();
// The "request" property contains various information about the web page loaded.
const url = request.url;
// Use "log" object to print information to actor log.
log.info('Scraping Page', { url, pageTitle });
// Adding next page to the queue
var baseUrl = '...';
if($('div.d3-detailpager__element--next a').length > 0)
{
var nextPageUrl = $('div.d3-detailpager__element--next a').attr('href');
log.info('Found another page', { nextUrl: baseUrl.concat(nextPageUrl) });
context.enqueueRequest({ url:baseUrl.concat(nextPageUrl) });
}
// My code for scraping follows here
return { /*my scaped object*/}
}
Missing await
await context.enqueueRequest