I'm using JSDOM to set up html for processing.
async function processHtml(input) {
const dom = new JSDOM(input)
const tables = dom.window.document.getElementsByTagName('tbody')
for (let x of tables) {
if (x.getElementsByTagName('tr').length === 1) {
const test = [...x.getElementsByTagName('tr')]
console.log("Line 32:", test)
} else {
console.log("Line 32:", x.getElementsByTagName('tr').length)
}
}
}
What I'm getting from this algorithm is:
Line 32: HTMLTableRowElement {}
Line 32: 11
Line 32: 10
Line 32: 10
Line 32: HTMLTableRowElement {}
Line 32: HTMLTableRowElement {}
Line 32: 11
Line 32: 12
Line 32: 3
Line 32: HTMLTableRowElement {}
I'm stuck. These are not regular objects? How do I process them?
Note
How do I use DOM methods on HTMLTableRowElement { }?
Update 1: Change function
I want to see what I'm working with here.
async function processHtml(input) {
const dom = new JSDOM(input)
const tables = dom.window.document.getElementsByTagName('tbody')
Object.keys(tables).forEach(x => console.log(tables[x]))
}
This function returns:
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
So it seems like this is going to be a pattern. I haven't a clue what tools are available to help me deal with this properly.
Some ideas on cutting through this would be appreciated. Thank you.
Update 2: If someone else finds this question useful
This algorithm brought me closer to the solution I was seeking. Thanks to the accepted answer.
async function processHtml(input) {
const dom = new JSDOM(input)
Array.from(dom.window.document.querySelectorAll('table tbody')).forEach((tbody, i) => {
if (i === 4 || i === 5) {
console.log(`========= ${i} ============`)
Array.from(tbody.querySelectorAll('td')).forEach((td, j) => {
if (j === 0 || j === 1){
console.log(`[${j}]`, td.innerHTML)
}
})
console.log('===========================')
}
})
You have some options. First of, if you want to iterate through them with their default iteration behavior, you need to use for of
, like you did.
If you want to also use Array methods, you can convert NodeList
or HTMLLiveCollection
to an array by:
Array.prototype.slice.call(...)
Array.from(...)
Array.from(document.querySelectorAll('table tbody')).forEach(tbody=>{
//do something with tbody
Array.from(tbody.querySelectorAll("tr")).forEach(tr => {
//do something with tr
})
})
In the above example, change document
to dom.window.document
and if you'd like, you could have used getElementsByTagName
method.
getElementsByClassName
and getElementsByTagName
return live HTMLCollection
, meaning returned object is array like but not an array, and gets updated as you change the DOM. querySelectorAll
returns a NodeList
, similar to HTMLCollection
but does NOT update. They both have legacy methods like item
to get the node by index, but I suggest converting them to arrays first.
In the example above, instead of the inner forEach
loop, you could have also used Array.from(tbody.childNodes)
and check if a given items tagName
property is equal to TR
or not and proceed accordingly.
You have too many options depending on what you like, I suggest going through MDN for Node and Element docs.