I'm writing a parser that should extract "Extract This Text" from the following html:
<div class="a">
<h1>some random text</h1>
<div class="clear"></div>
Extract This Text
<p></p>
<h2></h2>
</div>
I've tried to use:
document.querySelector('div.a > :nth-child(3)');
And even by using next sibling:
document.querySelector('div.a > :nth-child(2) + *');
But they both skips it and returns only the "p" element.
The only solution I see here is selecting the previous node and then using nextSibling
to access it.
Can querySelector
select text nodes at all?
Text node: https://developer.mozilla.org/en-US/docs/Web/API/Text
As already answered, CSS does not provide text node selectors and thus document.querySelector
doesn't.
However, JavaScript does provide an XPath-parser by the method document.evaluate
which features many more selectors, axises and operators, e.g. text nodes as well.
let result = document.evaluate(
'//div[@class="a"]/div[@class="clear"]/following-sibling::text()[1]',
document,
null,
XPathResult.STRING_TYPE
).stringValue;
console.log(result.trim());
<body>
<div class="a">
<h1>some random text</h1>
<div class="clear"></div>
Extract This Text
<p></p>
But Not This Text
<h2></h2>
</div>
</body>
//
means any number of ancestor nodes.
/html/body/div[@class="a"]
would address the node absolutely.
It should be mentioned that CSS queries work much more performant than the very powerful XPath evaluation. Therefore, avoid the excessive usage of document.evaluate
when document.querySelectorAll
works as well. Reserve it for the cases where you really need to parse the DOM by complex expressions.