Search code examples
javascriptplaywright

How can you get an element's text content without getting its children's text content?


Let's say you are using Playwright to validate some HTML that looks like this:

<span>
  The time is:
  <time>5:30 pm</time>
</span>

You can use this code:

page.locator('span', {has: page.locator('time')}).textContent();

to get:

The time is: 5:30 pm

But what if you just want the first part, since it won't change?

The time is:

Is there any way to get an element's text content without getting its children's text?

Currently the only solution I can come up with is get the text of both and then remove the child's text:

const parent = page.locator('span', {has: page.locator('time')});
const parentText = parent.textContent();
const child = parent.locator('time');
const childText = child.textContent();
const onlyParentText = parentText(0, parentText.length - childText.length);

...but that's a lot of JavaScript just to get a single DOM node's text.

Is there any easier way to do the above using Playwright features?


Solution

  • I don't think Playwright has this built-in, so going into evaluate is probably the best approach:

    const text = await page
      .locator("span", {has: page.locator("time")})
      .evaluate(el => el.firstChild.textContent);
    

    To generalize it to cases with multiple text nodes or arbitrary positioning within a parent,

    const text = await page
      .locator("span", {has: page.locator("time")})
      .evaluate(el =>
        [...el.childNodes]
          .filter(e => e.nodeType === Node.TEXT_NODE)
          .map(e => e.textContent)
      );
    

    Expect to trim and join text as necessary. For example:

    const playwright = require("playwright"); // ^1.39.0
    
    const html = `
    <p>
      a <b>ignore this</b>
    </p>
    <p> b <b>ignore this</b> c </p>
    <p> d <b>ignore this</b> e </p>`;
    
    let browser;
    (async () => {
      browser = await playwright.firefox.launch();
      const page = await browser.newPage();
      await page.setContent(html);
      const text = await page
        .locator("p")
        .evaluateAll(els =>
          els.map(el =>
            [...el.childNodes]
              .filter(
                e =>
                  e.nodeType === Node.TEXT_NODE &&
                  e.textContent.trim()
              )
              .map(e => e.textContent.trim())
          )
        );
      console.log(text); // => [ [ 'a' ], [ 'b', 'c' ], [ 'd', 'e' ] ]
    })()
      .catch(err => console.error(err))
      .finally(() => browser?.close());
    

    As mentioned in the comments, if you're asserting in a test, probably best to use

    await expect(locator).toHaveText(/^\s*The time is:/m);