Search code examples
javascriptpuppeteerethereumetherscan

Puppeteer element selection returning null or timing out


I am trying to use puppeteer to extract the innerHTML value from a button on a webpage. For now, I am simply trying to await the appearance of the selector to allow me to then work with it.

On running the below code the program times out waiting.

const puppeteer = require("puppeteer");

const link =
  "https://etherscan.io/tx/0xb06c7d09611cb234bfcd8ccf5bcd7f54c062bee9ca5d262cc5d8f3c4c923bd32";

async function configureBrowser() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(link);

  return page;
}

async function findFee(page) {
  await page.reload({ waitUntil: ["networkidle0", "domcontentloaded"] });
  await page.waitForSelector("#txfeebutton");
  console.log("boom");
}

const setup = async () => {
  const page = await configureBrowser();
  await findFee(page);
  await browser.close();
};

setup();

As you can see below, the element definitely exists in the HTML:

HTML evidence

Console output:

enter image description here


Solution

  • It works fine with a user agent string:

    const puppeteer = require("puppeteer"); // ^19.0.0
    
    let browser;
    (async () => {
      browser = await puppeteer.launch({headless: true});
      const [page] = await browser.pages();
      const ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
      await page.setExtraHTTPHeaders({"Accept-Language": "en-US,en;q=0.9"});
      await page.setUserAgent(ua);
      const url = "https://etherscan.io/tx/0xb06c7d09611cb234bfcd8ccf5bcd7f54c062bee9ca5d262cc5d8f3c4c923bd32";
      await page.goto(url);
      const btn = await page.waitForSelector("#txfeebutton");
      console.log(await btn.evaluate(el => el.textContent.trim())); // => ($0.56)
    })()
      .catch(err => console.error(err))
      .finally(() => browser?.close())
    ;
    

    One debugging strategy for this trying the same script with headless: false and seeing if that works, then checking page.content() when running headlessly. You can see Cloudflare is detecting your scraper and presenting a captcha.

    Related: