Search code examples
puppeteerapify

page.on('response') is not accessible in handlePageFunction // PuppeteerCrawler (Apify SDK)


I try to get some data from the page.on('response') event. This data should be pushed into the dataset with pushData.

It seems that this events:

await page
    .on('response', response => {
        if (response.status() === 404) {
            responseErrors.push(new Object({
                status: response.status(),
                url: response.url()
            }))
        }
    })
    .on('pageerror', err => {
        if (err.message) {
            pageErrors.push(JSON.stringify(err.message))
        }
    })
    .on('console', message => {
        consoleErrors.push(new Object({
            type: message.type(),
            url: message.text()
        }))
    });

Have no response if they are used in handlePageFunction.

If i add them to the gotoFunction of PuppeteerCrawler i get results. The problem is that i cant push into the same dataset.

So what would be the right way to access this data?


Solution

  • Yes, it doesn't work in handlePageFunction because the page is already opened and responses have been processed. You have 2 options:

    1. Use the response parameter on handlePageFunction https://sdk.apify.com/docs/typedefs/puppeteer-handle-page-inputs

    2. Do what you did in the gotoFunction and instead of pushing to dataset, update request.userData and then read this in handlePageFunction, merge with your data and push to dataset.