Search code examples
javascriptweb-scrapingapify

Apify web scraper task not stable. Getting different results between runs minutes apart


I'm building a very simple scraper to get the 'now playing' info from an online radio station I like to listen too.

It's stored in a simple p element on their site: data html location

Now using the standard apify/web-scraper I run into a strange issue. The scraping sometimes works, but sometimes doesn't using this code:

async function pageFunction(context) {
    const { request, log, jQuery } = context;
    const $ = jQuery;
    const nowPlaying = $('p.js-playing-now').text();
    return {
        nowPlaying
    };
}

If the scraper works I get this result: [{"nowPlaying": "Hangover Hotline - hosted by Lamebrane"}]

But if it doesn't I get this: [{"nowPlaying": ""}]

And there is only a 5 minute difference between the two scrapes. The website doesn't change, the data is always presented in the same way. I tried checking all the boxes to circumvent security and different mixes of options (Use Chrome, Use Stealth, Ignore SSL errors, Ignore CORS and CSP) but that doesn't seem to fix it unfortunately. Scraping instable

Any suggestions on how I can get this scraping task to constantly return the data I need?


Solution

  • It would be great if you can attach the URL, it will help me to find out the problem.

    With the information you provided, I guess that the data you want to are loaded asynchronously. You can use context.waitFor() function.

    async function pageFunction(context) {
        const { request, log, jQuery } = context;
        const $ = jQuery;
        await context.waitFor(() => !!$('p.js-playing-now').text());
        const nowPlaying = $('p.js-playing-now').text();
        return {
            nowPlaying
        };
    }
    

    You can pass the function to wait, and I will wait until the result of the function will be true. You can check the doc.