Search code examples
javascriptnode.jsweb-scrapingnext.jspuppeteer

Get an element when changes classname in Puppteer JS with the waitForSelector method


I need to wait for an element to change classes.

The problem is when I use the waitForSelector function, it wouldn't work because no new element is added to the DOM. However, the <div> element changes its class name.

What's the right approach to wait for an element until its class name changes, or wait until a certain class name appears?

My current code:

import type { NextApiRequest, NextApiResponse } from "next";
const puppeteer = require("puppeteer");
export default async function handler(
  req: NextApiRequest,
  res: NextApiResponse
) {
  const browser = await puppeteer.launch({
    executablePath:
      "../../../../../../Program Files (x86)/Google/Chrome/Application/chrome.exe",
    headless: false,
  });
  const page = await browser.newPage();

  await page.goto("https://www.craiyon.com/", {
    timeout: 0,
    waitUntil: "domcontentloaded",
  });
  await page.waitForTimeout(1000);
  await page.type(".svelte-1g6bo9g", "sausage");
  await page.click("#generateButton");
  const test = await page.waitForSelector(
    ".h-full.w-full.cursor-pointer.rounded-lg.border.border-medium-blue.object-cover.object-center.transition-all.duration-200.hover:scale-[0.97].hover:border-2.hover:border-grey",
    {
      timeout: 0,
    }
  );

  await browser.close();
  console.log(test);
  res.status(200).json({ test: "test" });
}

This is the class name that changes later on:

.h-full.w-full.cursor-pointer.rounded-lg.border.border-medium-blue.object-cover.object-center.transition-all.duration-200.hover:scale-[0.97].hover:border-2.hover:border-grey

And finally this is the class name I'm trying to get: .grid.grid-cols-3.gap-1.sm:gap-2.


Solution

  • I believe you've misunderstood waitForSelector. It doesn't care whether an element was newly created or already existed and had a new class modification. Both are DOM mutations and will register as a match.

    Instead of using the old selector you're waiting to disappear, you can wait for the selector you want to exist. waitForSelector will resolve as soon as that selector is ready regardless of how it made it into the DOM or which element it's on.

    If you want to wait for something to disappear or change, you could use waitForFunction, which is a more general version of waitForSelector.

    Also, : denotes a pseudoselector--it's technically valid but won't match with .sm:gap-2. You can leave that class out or use the attribute-style selector suggested in this comment, with the caveat that those can be overly picky--if the order changes, it'll fail.

    It seems fine to leave that part out for now, and we can get the URLs from the response, which is what we care about mostly, I'm guessing:

    const puppeteer = require("puppeteer"); // ^19.6.3
    
    const url = "<Your URL>";
    
    let browser;
    (async () => {
      browser = await puppeteer.launch();
      const [page] = await browser.pages();
      await page.goto(url, {waitUntil: "domcontentloaded"});
      await page.type("#prompt", "sausage");
      const imgUrls = new Set();
      const responsesArrived = Promise.all(
        [...Array(9)].map(() =>
          page.waitForResponse(
            res => {
              if (
                res.request().resourceType() === "image" &&
                res.url().startsWith("https://img.craiyon.com") &&
                res.url().endsWith(".webp") &&
                !imgUrls.has(res.url())
              ) {
                imgUrls.add(res.url());
                return true;
              }
            },
            {timeout: 120_000}
          )
        )
      );
      await page.click("#generateButton");
      const responses = await responsesArrived;
      console.log([...imgUrls]);
      const grid = await page.waitForSelector(
        ".grid.grid-cols-3.gap-1"
      );
      await grid.screenshot({path: "test.png"});
    })()
      .catch(err => console.error(err))
      .finally(() => browser?.close());
    

    Suggestions:

    • Try to avoid waitForTimeout. It's deprecated and causes a race condition, either slowing your script down or making it fail randomly. Puppeteer's docs recommend against using it.
    • Never use timeout: 0, especially when debugging scripts. There's no reason to block forever. If a selector fails and your script never reports where that failure happened, but instead hangs, you miss out on important diagnostic information. If you really have to wait for something or your computer will explode, make it 10 minutes, a day or a week (if you really expect something to take that long), but not infinity. If it's mission-critical, you can catch the throw and retry the action.
    • Avoid long selectors. They're usually brittle because they assume too much about the structure or classes on a page. This page is a bit hostile, offering few quality hooks into the elements, but it's still worth keeping in mind. It's generally considered best to select by user-visible attributes, like roles and text.
    • The site loads a ton of garbage resources, so you'll speed things up and save resources by blocking everything you don't need. You can look at all URLs with page.on("request", req => console.log(req.url()), then systematically block the ones that aren't relevant to getting your result.

    Disclosure: I'm the author of the linked blog posts.