Search code examples
javascriptjsdom

How do I process HTMLCollection {}?


I'm using JSDOM to set up html for processing.

async function processHtml(input) {
  const dom = new JSDOM(input)
  const tables = dom.window.document.getElementsByTagName('tbody')
  for (let x of tables) {
    
    if (x.getElementsByTagName('tr').length === 1) {
      const test = [...x.getElementsByTagName('tr')]
      console.log("Line 32:", test)
    } else {
      console.log("Line 32:", x.getElementsByTagName('tr').length)
    }
  }
}

What I'm getting from this algorithm is:

Line 32: HTMLTableRowElement {}
Line 32: 11
Line 32: 10
Line 32: 10
Line 32: HTMLTableRowElement {}
Line 32: HTMLTableRowElement {}
Line 32: 11
Line 32: 12
Line 32: 3
Line 32: HTMLTableRowElement {} 

I'm stuck. These are not regular objects? How do I process them?

Note

How do I use DOM methods on HTMLTableRowElement { }?

Update 1: Change function

I want to see what I'm working with here.

async function processHtml(input) {
  const dom = new JSDOM(input)
  const tables = dom.window.document.getElementsByTagName('tbody')

  Object.keys(tables).forEach(x => console.log(tables[x]))
}

This function returns:

HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}
HTMLTableSectionElement {}

So it seems like this is going to be a pattern. I haven't a clue what tools are available to help me deal with this properly.

Some ideas on cutting through this would be appreciated. Thank you.

Update 2: If someone else finds this question useful

This algorithm brought me closer to the solution I was seeking. Thanks to the accepted answer.

async function processHtml(input) {
  const dom = new JSDOM(input)
  Array.from(dom.window.document.querySelectorAll('table tbody')).forEach((tbody, i) => {
    if (i === 4 || i === 5) {
      console.log(`========= ${i} ============`)
      Array.from(tbody.querySelectorAll('td')).forEach((td, j) => {
        if (j === 0 || j === 1){
          console.log(`[${j}]`, td.innerHTML)
        }
      })
      console.log('===========================')
    }
  })

Solution

  • You have some options. First of, if you want to iterate through them with their default iteration behavior, you need to use for of, like you did.

    If you want to also use Array methods, you can convert NodeList or HTMLLiveCollection to an array by:

    • old way: Array.prototype.slice.call(...)
    • es6: Array.from(...)
    Array.from(document.querySelectorAll('table tbody')).forEach(tbody=>{
        //do something with tbody
        Array.from(tbody.querySelectorAll("tr")).forEach(tr => {
            //do something with tr
        })
    })
    

    In the above example, change document to dom.window.document and if you'd like, you could have used getElementsByTagName method.

    getElementsByClassName and getElementsByTagName return live HTMLCollection, meaning returned object is array like but not an array, and gets updated as you change the DOM. querySelectorAll returns a NodeList, similar to HTMLCollection but does NOT update. They both have legacy methods like item to get the node by index, but I suggest converting them to arrays first.

    In the example above, instead of the inner forEach loop, you could have also used Array.from(tbody.childNodes) and check if a given items tagName property is equal to TR or not and proceed accordingly.

    You have too many options depending on what you like, I suggest going through MDN for Node and Element docs.