Search code examples
javascriptnode.jsweb-scrapinges6-promisepuppeteer

Puppeteer how to retry url fetch with delay if it failed


I try to write simple web-scraper using puppeteer library.

When I get page by url via page.goto, I need to retry if it failed, i.e response code is >= 400.

My snippet:

'use strict';
const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();
    await page.setViewport({width: 1024, height: 768});
    await page.setDefaultNavigationTimeout(0);

    await page.goto('https://google.com');

    await browser.close();
    process.exit();
})();

I need to implement fail strategy to retry url if response.code is >= 400.
I need delay beetween attempts equal to retryNumber * 1000ms:

  • 1000 ms for first attempt;
  • 2000 ms for second attempt;
  • 3000 ms for third attempt and so on.

Promise should be rejected if retryNumber exceeds maxRetryNumber.

Who knows how to implement this via code? Are there any ready to use packets or snippets to achieve the goal?


Solution

  • You can then use a simple for loop to execute your retries (exit the for loop when your request was successful):

    'use strict';
    const puppeteer = require('puppeteer');
    const delay = (ms) => {
        return new Promise(resolve => setTimeout(resolve, ms));
    };
    
    (async () => {
        const browser = await puppeteer.launch({headless: false});
        const page = await browser.newPage();
        await page.setViewport({width: 1024, height: 768});
        await page.setDefaultNavigationTimeout(0);
    
        const maxRetryNumber = 10;
        let success = false;
        for (let retryNumber = 1; retryNumber <= maxRetryNumber; retryNumber++) {
            const response = await page.goto('https://google.com');
            if (response.status() < 400) {
                success = true;
                break;
            }
            await delay(1000 * retryNumber);
        }
    
        if (!success) {
            // do something
        }
    
        await browser.close();
        process.exit();
    })();
    
    

    Source of delay function.