Search code examples
javascriptnode.jsqueryselector

How to best handle variable item count (and order) when selecting from unordered list


I get innerText-Properties from list-items inside a ul-element that can have anything from 4 up to 10 li-items. Some items like profilname, age and location will always be there, others like current term, prior degree and other information about a student profile can be filled, but don't have to. So the list will have a different length for almost every profile and the :nth-child(x) - Element will contain different information all the time. I want to feed a object constructor with that data, that expects say the degree as the 5th argument.

How would you go about checking which information is present in the list and setting a placeholder like "n.a." for missing values? Is that something I should even try doing inside my node-script? Or is that a job for later in the database?

My puppeteer function to get the elements via their querySelectors up to that problem looks like this:

var ratingDetails = await page.evaluate(() => {

//get each element (that could be available) from a div

   let text = document.querySelector("div.report-text").innerText
   let age = document.querySelector
             ("div.card-block > ul.list-unstyled > li:nth-child(1) > span").innerText
   let sex = document.querySelector
             ("div.card-block > ul.list-unstyled > li:nth-child(2) > span").innerText      
   let startYear = document.querySelector
             ("div.card-block > ul.list-unstyled > li:nth-child(3) > span").innerText
   let studyForm = document.querySelector
             ("div.card-block > ul.list-unstyled > li:nth-child(4) > span").innerText
   let location = document.querySelector
             ("div.card-block > ul.list-unstyled > li:nth-child(5) > span").innerText
          
     [...and some more...]

    })
    
//and then use the spread syntax to fill my constructor

await ratingDetails.map(facts => new ReportObject(...facts)));

Many thanks for any advice how to handle that issue!


Solution

  • After a lot of try and error I came up with the following solution:

    1. loop over every li-element in the unordert list und grab the innerText-Properties
    let text = [];
    for (let counter = 1; counter <= metaListe; counter++) {
      text = await page.evaluate((counter) => {
      let liElements = document.querySelector(`div.card-block > ul.list-unstyled > li:nth-child(${counter})`).innerText.trim();
      return liElements;
      }, counter);
    
    1. define some regEx-Patterns for all possible li-items
    const patt_jahrStudBeginn = /^Studienbeginn/;
          const patt_abschluss = /^Abschluss/i;
          const patt_aktFS = /^Aktuelles/;
          const patt_studienForm = /^Studienform/;
          [and some more...]
    
    1. compare the innerText-Properties from step 1 with the patterns and return a variable if it's a match (and continue with the next string/innerText
    if(!document.querySelector(`div.card-block > ul.list-unstyled > li:nth-child(${counter})`))
        {return;}
        else{
          if(patt_studienForm.test(text)) {
            let studForm = document.querySelector(`div.card-block > ul.list-unstyled > li:nth-child(${counter}) > span`).innerText;
          }else{
            if(patt_studienDauer.test(text)) {
              let studDauer = document.querySelector(`div.card-block > ul.list-unstyled > li:nth-child(${counter}) > span`).innerText;
            }else{
              if(patt_jahrStudBeginn.test(text)) {
                let jahrBeginn = document.querySelector(`div.card-block > ul.list-unstyled > li:nth-child(${counter}) > span`).innerText;
              }else{
                if(patt_aktFS.test(text)) {
                  let aktFS = document.querySelector(`div.card-block > ul.list-unstyled > li:nth-child(${counter}) > span`).innerText;
    [...and more...]
    

    And return all variables containing the different information from the page.evaluate()-function. It took me quite some time to understand, that I have to pass any counting variable to the .evaluate()-method to be able to use the current loop index inside it to refer to the n-th list element.

    That super deep if-condition can't be good code. I will probally ask how to enhance that type of comparison with an array in a separate question. But it works as it is.