Search code examples
selectors-api

Select "Text" node using querySelector


I'm writing a parser that should extract "Extract This Text" from the following html:

<div class="a">
    <h1>some random text</h1>
    <div class="clear"></div>
    Extract This Text
    <p></p>
    <h2></h2>
</div>

I've tried to use:

document.querySelector('div.a > :nth-child(3)');

And even by using next sibling:

document.querySelector('div.a > :nth-child(2) + *');

But they both skips it and returns only the "p" element.

The only solution I see here is selecting the previous node and then using nextSibling to access it.

Can querySelector select text nodes at all?
Text node: https://developer.mozilla.org/en-US/docs/Web/API/Text


Solution

  • As already answered, CSS does not provide text node selectors and thus document.querySelector doesn't.

    However, JavaScript does provide an XPath-parser by the method document.evaluate which features many more selectors, axises and operators, e.g. text nodes as well.

    let result = document.evaluate(
      '//div[@class="a"]/div[@class="clear"]/following-sibling::text()[1]',
      document,
      null,
      XPathResult.STRING_TYPE
    ).stringValue;
    
    console.log(result.trim());
    <body>
      <div class="a">
        <h1>some random text</h1>
        <div class="clear"></div>
        Extract This Text
        <p></p>
        But Not This Text
        <h2></h2>
      </div>
    </body>

    // means any number of ancestor nodes.
    /html/body/div[@class="a"] would address the node absolutely.

    It should be mentioned that CSS queries work much more performant than the very powerful XPath evaluation. Therefore, avoid the excessive usage of document.evaluate when document.querySelectorAll works as well. Reserve it for the cases where you really need to parse the DOM by complex expressions.