Search code examples
puppeteerapify

Access and Loop External Datasource with Apify Puppeteer Scraper


The Apify Puppeteer Scraper does not expose jquery in the context object. I need to access an external JSON data source within the Puppeteer Scraper pageFunction and then loop over one of the nodes. Here is what I would do if jquery was available:

$.get(urlAPI, function(data) {
     $.each(data.feed.entry, function(index, value) {
        var url = value.URL;

Solution

  • As the handlePageFunction runs in node js context, there is no jQuery. You can easily include jQuery into page.evaluate function using Apify SDK.

    async function pageFunction(context) {
        const { page, request, log, Apify } = context;
        await Apify.utils.puppeteer.injectJQuery(page);
        const title = await page.evaluate(() => {
            // There is jQuery include as we incleded it using injectJQuery method
            return $('title').text()
        });
        return {
            title,
        }
    }
    

    EDIT: Using requestAsBrowser.

    async function pageFunction(context) {
        const { page, request, log, Apify } = context;
        const response = await Apify.utils.requestAsBrowser({ url: "http://example.com" });
        const data = JSON.parse(response.body);
        return {
            data,
        }
    }