Search code examples
node.jsweb-scrapingaws-lambdapuppeteerwebpage-screenshot

Prevent puppeteer goto() from failing when timeout occurs for screenshotting


I am using puppeteer core version 19.6.3 through AWS Lambda to take screenshots of webpages. Here's what my code looks like:

  // Create a browser instance
  const browser = await puppeteer.launch({
    args: chromium.args,
    defaultViewport: chromium.defaultViewport,
    executablePath: await chromium.executablePath("./"),
    headless: chromium.headless,
    ignoreHTTPSErrors: true,
  });

  // Create a new page
  const page = await browser.newPage();

  // Set viewport width and height
  await page.setViewport({ width: pageWidth, height: pageHeight, deviceScaleFactor: scaleFactor });
  // Open URL in current page
  await page.goto(websiteURL);

  // Capture screenshot
  const screenshot = await page.screenshot();

This works most of the time, but sometimes it fails when the page I want to screenshot is heavy and has has lots of iframes. In those cases it fails with Navigation timeout of 30000 ms exceeded. I don't want to get rid of the timeout completely, because I don't want to this function to take a potentially long time or never end.

Is there some way I can call page.goto() and have it wait for the page to load, but if the page doesn't load after 30 seconds (or some other defined timeout), then it just continues on to take a screenshot of the not fully loaded page anyway? I don't care if the resulting screenshot looks bad or incomplete, I still want it to happen no matter what.


Solution

  • I added a try/catch to the page.goto() so I could continue to screenshot even if it reached timeout while waiting for loading.

      // Open URL in current page
      try {
        await page.goto(websiteURL);
      } catch (error) {
        console.error("Error going to page, but continuing with screenshot: ", error);
      }
    
      // Capture screenshot
      const screenshot = await page.screenshot();