Search code examples
javascriptnode.jstypescriptpuppeteer

Puppeteer - waitForResponse() timeout, but page.on('response') finds response


I'm trying to get an XHR response from a webpage. I found the

await page.waitForResponse(url);

or

await page.waitForResponse((res) => {
  if (res.url() === myUrl) return true;
});

method, but it always timeout for the url response I'm trying to get.

However, if I set

page.on('response', (res) => {
  if (res.url() === myUrl) {
    // do what I want with the response
  }
})

the correct response is found and I can retrive the data.

After some debugging, seems like waitForResponse() isn't returning any XHR req/res.

Any ideias?

EDIT: Example. For this case, its required to use puppeteer-extra-plugin-stealth and puppeteer-extra package, otherwise, this URL will return status code '403':

import StealthPlugin from 'puppeteer-extra-plugin-stealth';
import UserAgent from 'user-agents';
import puppeteer from 'puppeteer-extra';
import { Page } from 'puppeteer';

const wantedUrl = 'https://www.nike.com.br/DataLayer/dataLayer';

const workingFunction = async (page: Page) => {
    let reqCount = 0;
    let resCount = 0;

    page.on('request', req => {
        reqCount++;
        if (req.url() == wantedUrl) {
            console.log('The request I need: ', req.url());
            console.log(reqCount);
        }
    });
    page.on('response', async res => {
        resCount++;
        if (res.url() == wantedUrl) {
            console.log('The response I need:', await res.json());
            console.log(resCount);
        }
    });

    await page.goto('https://www.nike.com.br/tenis-nike-sb-dunk-low-pro-unissex-153-169-229-284741', {
        timeout: 0,
    });
};

const notWorkingFunction = async (page: Page) => {
    let resCount = 0;
    await page.goto('https://www.nike.com.br/tenis-nike-sb-dunk-low-pro-unissex-153-169-229-284741');
    const res = await page.waitForResponse(
        res => {
            resCount++;
            console.log(res.url());
            console.log(resCount);
            if (res.url() === wantedUrl) {
                return true;
            }
            return false;
        },
        { timeout: 0 }
    );

    return res;
};

(async () => {
    puppeteer.use(StealthPlugin());
    const browser = await puppeteer.launch({});
    const page = await browser.newPage();
    const userAgent = new UserAgent({ deviceCategory: 'desktop' });
    await page.setUserAgent(userAgent.random().toString());

    try {
        // workingFunction(page);
        const res = await notWorkingFunction(page);
    } catch (e) {
        console.log(e);
    }
})();

Solution

  • The reason the page.on version works is because it sets the request/response handlers before performing navigation. On the other hand, the waitForResponse version waits until the "load" event fires (page.goto()'s default resolution point), and only then starts tracking responses with the call to page.waitForResponse. MDN says of the load event:

    The load event is fired when the whole page has loaded, including all dependent resources such as stylesheets and images. This is in contrast to DOMContentLoaded, which is fired as soon as the page DOM has been loaded, without waiting for resources to finish loading.

    Based on this, we can infer that by the time the load event fires and the waitForResponse function finally starts listening to traffic, it's already missed the desired response, so it just waits forever!

    The solution is to create the promise for page.waitForResponse before (or at the same time as) the goto call such that no traffic is missed when you kick off navigation.

    I also suggest using "domcontentloaded" on the goto call. "domcontentloaded" is underused in Puppeteer -- there's no sense in waiting for all resources to arrive when you're just looking for one. The default "load" or often-used "networkidleN" settings are better for use cases like screenshotting the page where you want the whole thing to look like it does as a user would see it. To be clear, this isn't the fix to the problem, just an optimization, and it's not too apparent from the docs which is suitable when.

    Here's a minimal example (I used JS, not TS):

    const puppeteer = require("puppeteer-extra"); // ^3.2.3
    const StealthPlugin = require("puppeteer-extra-plugin-stealth"); // ^2.9.0
    const UserAgent = require("user-agents"); // ^1.0.958
    
    puppeteer.use(StealthPlugin());
    
    let browser;
    (async () => {
      browser = await puppeteer.launch();
      const [page] = await browser.pages();
      const userAgent = new UserAgent({deviceCategory: "desktop"});
      await page.setUserAgent(userAgent.random().toString());
      const url = "https://www.nike.com.br/tenis-nike-sb-dunk-low-pro-unissex-153-169-229-284741";
      const wantedUrl = "https://www.nike.com.br/DataLayer/dataLayer";
      const [res] = await Promise.all([
        page.waitForResponse(res => res.url() === wantedUrl, {timeout: 90_000}),
        page.goto(url, {waitUntil: "domcontentloaded"}),
      ]);
      console.log(await res.json());
    })()
      .catch(err => console.error(err))
      .finally(() => browser?.close());
    

    (Note that the site has changed since the time this was posted--the code no longer works, but the fundamental ideas still apply)