I'm trying to find the correct way to traverse an HTML AST and find all the nodes with type: tag
and push them into an array.
Also I'm using html-parse-stringify
to convert my HTML into an AST if that helps with anything.
I've watched some videos on traversing HTML ASTs on youtube but they all start with one object as the main starting node, vs I'm starting with an Array. But doubt that is much of a problem.
The set of data I'm working with is a websites scraped data which is then converted into an AST using the previously mentioned library.
From here I just want to create a basic looping structure that can fully traverse my the AST while filtering out all the unnecessary types such as text & comment
and then pushing the correct object into an array.
Here is the data structure that I'm working with, I've placed an empty data structure for ease of copying.
Also I would like to reduce the use of loops as much as possible for time complexity sake.
function mainLoop(node) {
Array.prototype.forEach.call(node, parent => {
console.log(parent.name);
const children = parent.children.filter(n => n.type !== 'text' && n.type !== 'comment');
loop(children)
})
}
function loop(children) {
console.log(children.name)
if (children) {
Array.prototype.forEach.call(children, child => {
loop(child);
})
}
}
mainLoop();
Empty Data Structure
const docTree = [
{
attrs: {
class: "flex flex-col h-screen",
},
children: [
{
type: 'tag',
name: 'main',
attrs: {
class: ''
},
children: [],
}
],
name: 'div',
type: 'tag',
voidElement: false,
}
]
If your only goal is to remove text and comments, then it's pretty straightforward in a single reduce:
const traverse = (nodes) => {
return nodes.reduce((acc,node) => {
if(node.type === 'text' || node.type === 'comment') return acc;
return [ ...acc, { ...node, children: traverse(node.children) } ]
},[]);
}
I haven't actually run this code, but I think it'll work
If you want to flatten all the children then you do this:
const traverse = (nodes) => {
return nodes.reduce((acc,{children = [], ...node}) => {
if(node.type === 'text' || node.type === 'comment') return acc;
return [ ...acc, node, ...traverse(children) ]
},[]);
}
EDIT 2: Ah, I missed the part where you only want the type tag. That's done with this:
const traverse = (nodes) => {
return nodes.reduce((acc,{children = [], ...node}) => {
if(node.type !== 'tag') return acc;
return [ ...acc, node, ...traverse(children) ]
},[]);
}
Also, I'm not sure if you want the children to remain as part of the parent node or not. This here might also be what you want:
const traverse = (nodes) => {
return nodes.reduce((acc,node) => {
if(node.type !== 'tag') return acc;
return [ ...acc, node, ...traverse(node.children) ]
},[]);
}