Search code examples
javascripthtml5-canvaspuppeteer

Puppeteer Cannot Access Canvas Functions, but Chrome Dev Tools Can


I cannot figure out why a snippet I run in Chrome Dev Tools works perfectly, but I can't get it to work in Puppeteer. As an experiment, I was trying to "drive" the canvas for the NY Times Letter Boxed game: https://www.nytimes.com/puzzles/letter-boxed

In Google Chrome, I can get a handle to the canvas functions by selecting the canvas element and then extracting the "__reactInternalInstance" property from it (usually the property is named something like "__reactInternalInstance$2c4oug63f7m"). The javascript that does it is here:

var myObject = {};
var mycanvas = document.querySelector("#pz-game-root > div > div > div.lb-square-container > canvas")
for (let property in mycanvas) {
  if (property.includes("__reactInternalInstance")) {
      console.log(`${property}`);
      myObject = mycanvas[`${property}`];
  }
}
console.log(myObject);
//at this point IN CHROME,
//myObject has access to some of the interactive functions of the game play

Nearly identical code in puppeteer cannot seem to get access to any canvas properties, including the "__reactInternalInstance" property.

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        //args: ["--renderer","--single-process"],
        executablePath: "/usr/bin/google-chrome",
        headless: true,
        userDataDir: '/tmp'
    });

    const page = await browser.newPage();

    //use a mobile user agent
    page.setUserAgent("Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Mobile Safari/537.36");
    await page.setViewport({ width: 412, height: 915 });

    await page.goto('https://www.nytimes.com/puzzles/letter-boxed');
    try {

        /**
         * click the start button on the splash screen
         */
        await page.waitForSelector("button.pz-moment__button.primary");
        let startButton = await page.$("button.pz-moment__button.primary");
        await startButton.click();
        await delay(3000);

        /* !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
         * myHandle object comes back null,
         * no properties are logged,
         * and the "Found the canvas" message doesn't print
         * !!!!!!!!!!!!!!!!!!!!!!!!!!!!!
         */
        //capture the canvas element
        await page.waitForSelector("#pz-game-root > div > div > div.lb-square-container > canvas");
        let myHandle = await page.$("#pz-game-root > div > div > div.lb-square-container > canvas",
            (canv) => {
                console.log("Found the canvas");
                for (let property in canv) {
                    console.log(`${property}`);
                }
                return canv;
            }
        );
        console.log(myHandle);

    } catch (error) {
        console.log("Failure occurred");
        console.log(error);
    }
    await browser.close();
})();

function delay(time) {
    return new Promise(resolve => setTimeout(resolve, time));
}

I'm not very proficient with Puppeteer. Anyone able to tell me what I'm doing wrong?


Solution

  • I cannot figure out why a snippet I run in Chrome Dev Tools works perfectly, but I can't get it to work in Puppeteer

    This happens all the time. Websites are complex and it's seldom that you can just copy browser code into Puppeteer and expect it to work. For example, the browser console will expose iframes not accessible in Puppeteer (but that's not the problem here). Visibility, timing, bot detection and differences between native DOM methods and Puppeteer versions lead to all manner of differences.

    The main issue here, though, is misunderstanding the Puppeteer API, which is admittedly not easy to grasp. page.$ doesn't accept a callback. It's basically a wrapper on document.querySelector that returns an ElementHandle for a selector if it's selectable, else null.

    You're probably thinking of page.$eval, which accepts two arguments, a selector and a callback (and, optionally, variable arguments that populate the callback's arguments when executed in the browser).

    Even if you do get that callback working, keep in mind the console.log calls won't be visible from your Node process unless you monitor the browser console.

    return canv; is also not going to work. All return values from the browser to Puppeteer must be serializable, so complex structures like DOM elements, React instances and so forth are going to return as empty objects.

    I'm not sure what you're trying to achieve ultimately, but if you're planning on returning data back to Node to work with there, you'll need to pick out the primitive serializable properties and return those. Otherwise, you can return an ElementHandle with page.evaluateHandle or, most likely, perform your manipulations on the canvas to play the game in the browser context.

    It's unclear why you want to dig into the React instance (is some solution data buried in it?), but I generally don't recommend that approach. It's best to stay as close to the user interface and network as possible. If there's data React uses, it can usally be more reliably and easily intercepted from a network request or pulled out of a JSON payload that arrives with the static HTML. If you want to check state of the game or make moves, prefer doing it through the visible UI elements.

    That said, as a proof of concept, here I'm returning memoizedProps out of the React instance:

    const puppeteer = require("puppeteer"); // ^16.2.0
    
    let browser;
    (async () => {
      browser = await puppeteer.launch({headless: true});
      const [page] = await browser.pages();
      const ua = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Mobile Safari/537.36";
      await page.setUserAgent(ua);
      await page.setViewport({width: 412, height: 915});
      const url = "https://www.nytimes.com/puzzles/letter-boxed";
      await page.goto(url, {waitUntil: "domcontentloaded"});
      const btn = await page.waitForSelector("button.pz-moment__button.primary");
      await btn.click();
      const sel = await page.waitForSelector("#pz-game-root canvas");
      const memoizedProps = await sel.evaluate(el => 
        Object
          .entries(el)
          .find(([k, v]) => k.startsWith("__reactInternalInstance"))
          .pop()
          .memoizedProps
      );
      console.log(memoizedProps);
        // => { width: 320, height: 320, style: { width: 320, height: 320 } }
    })()
      .catch(err => console.error(err))
      .finally(() => browser?.close())
    ;
    

    Other suggestions:

    • Avoid rigid, hyper-specific selectors. #pz-game-root canvas is more robust and adaptable without much loss of precision (how many <canvas> elements do we expect to be under this id?) than a long chain of div > div > div > divs. Browser-generated selectors are often to blame for this practice.
    • There's usually no need to select things multiple times--waitForSelector returns the ElementHandle so you can .click() or .evaluate() on that.
    • Try to avoid sleep/delay/timeout almost always in favor of waitForSelector, waitForFunction, waitForResponse, etc. If you have to sleep, Puppeteer already gives you page.waitForTimeout so you don't have to reimplement it as you did with delay.
    • page.setUserAgent() returns a promise which you should await. Failing to await all promises in Puppeteer causes race conditions and bizarre, intermittent errors with confusing messages.

    My blog post explains in more detail why these are poor practices that should be avoided.