Search code examples
javascriptnode.jsparallel-processingautomated-testsplaywright

Playwright how to get all links inside a div and then check if each of the link results in a 200?


At a high level, I want to

  1. Go to some page, then use some locator (div etc) and pull all href links of a tag inside this locator.
  2. Then I want to individually go to each of the link and see if it's valid or a broken link meaning check if status code is 200 or not.
  3. One caveat here is that the 2nd point should run parallelly or else it would be extremely slow to check each link in serial fashion.

This is the code that I have for now (I've removed some sensitive info like URL, etc).

test.describe(`@Header Tests`, () => {
  test.beforeEach(async ({ page, context }) => {
    await page.goto(".....some url.....");
  });
  test(`@smoke Validate all header links give a status of 200`, async ({
    page,
  }) => {
    const elements = page.locator("section#mega-nav a");
    const links = await elements.evaluateAll<string[], HTMLAnchorElement>(
      (itemTexts) =>
        itemTexts
          .map((item) => item.href)
          .filter((href) => href !== "" && !href.includes("javascript:"))
    );

    // visit each link
    for (const link of links) {
      test(`Check status code for ${link}`, async () => {
        // Visit the link
        const response = await page.goto(link, {
          waitUntil: "domcontentloaded",
        });

        // Check if the response status code is 200
        expect(response?.status()).toBe(200);
      });
    }
  });
});

But when I run this, I get the below error

    Error: Playwright Test did not expect test() to be called here.
    Most common reasons include:
    - You are calling test() in a configuration file.
    - You are calling test() in a file that is imported by the configuration file.
    - You have two different versions of @playwright/test. This usually happens
      when one of the dependencies in your package.json depends on @playwright/test.

Is is even possible to do this in Playwright? That is to first fetch all the links on a page's div etc and then go to each of these links parallelly to check their statusCode?


Solution

  • I'd use request rather than page.goto:

    import {test} from "@playwright/test"; // ^1.41.2
    
    const html = `<!DOCTYPE html><html><body>
    <a href="https://news.ycombinator.com">yc</a>
    <a href="https://www.example.com">example</a>
    <a href="https://www.stackoverflow.com">so</a>
    <a href="https://www.badurlthatdoesntexist.com">bad url</a>
    </body></html>`;
    
    test("all links are valid", async ({page, request}) => {
      await page.setContent(html);
      const links = await page.locator("a")
        .evaluateAll(els => els.map(el => el.href));
    
      for (const link of links) {
        await request.get(link);
      }
    });
    

    (remove www.badurlthatdoesntexist.com to see the test pass)

    To speed this up, you can use a task queue (either hand-rolled or a library). Or, less optimally but with simple code and no dependencies, you can iterate in chunks of size N and use Promise.all to parallelize each chunk:

    test("all links are valid", async ({page, request}) => {
      await page.setContent(html);
      const links = await page.locator("a")
        .evaluateAll(els => els.map(el => el.href));
      const chunk = 3;
    
      for (let i = 0; i < links.length; i += chunk) {
        await Promise.all(links.slice(i, i + chunk).map(e => request.get(e)));
      }
    });