Given a fetched html page, I want to find the specific node that contains a portion of text. The hard way I guess it would be to iterate to all the nodes one by one, getting as deep as it goes, and for each case do a search with e.g. .includes()
But how is the wise way? There must be something, but I'm unable to google correctly for it
response = axios.get(url);
let parsedHtml = parser.parseFromString(response.data, 'text/html');
for (let i = 0; i < parsedHtml.children.length; i++)
if (parsedHtml.children[i].textContent.includes('hello'))
console.log(parsedHtml.children[i])
*it doesn't work
*Example code
<html>
<body>
<div>dfsdf</div>
<div>
<div>dfsdf</div>
<div>dfsdf</div>
</div>
<div>
<div>
<div>hello</div>
</div>
</div>
<div>dfsdf</div>
</body>
</html>
I would like to retrieve <div>hello</div>
as a node element
After getting almost convinced that I had to traverse the DOM the classical way, I've found this here Javascript: How to loop through ALL DOM elements on a page? which is indeed excellent:
let nodeIterator = document.createNodeIterator(
parsedHtml,
NodeFilter.SHOW_ELEMENT,
(node) => {
return (node.textContent.includes('mytext1')
|| node.textContent.includes('mytext2'))
&& node.nodeName.toLowerCase() !== 'script' // not interested in the script
&& node.children.length === 0 // this is the last node
? NodeFilter.FILTER_ACCEPT : NodeFilter.FILTER_REJECT;
}
);
let pars = [];
let currentNode;
while (currentNode = nodeIterator.nextNode())
pars.push(currentNode);
console.log(pars[0].textContent); // for example