Search code examples
javascriptjquerygoogle-apps-scriptcheerio

Improving a loops and tasks to scrape multiple paths that I need to match values


The page elements are in this pattern:

<ul class="scorer-info">
 <li class="  ">
   <span class="scorer">
       <a href="/players/maximilian-mittelstadt/359295/">
         M. Mittelstädt
       </a>

       

         <span class="minute">
           3'
         </span>

       

         <span class="assist">
           (assist by
           <a href="/players/salomon-kalou/2540/">S. Kalou</a>)
         </span>

   </span>

   <span class="score">1 - 0</span>

   <span class="scorer">
  </span>
   <div class="clearfix"></div>
 </li>
 <li class="  ">
   <span class="scorer">
   </span>

   <span class="score">1 - 1</span>

   <span class="scorer">

         <span class="minute">
           7'
         </span>


       <a href="/players/serge--gnabry/213651/">
         S. Gnabry
       </a>

       

  </span>
   <div class="clearfix"></div>
 </li>
 <li class="  ">
   <span class="scorer">
   </span>

   <span class="score">1 - 2</span>

   <span class="scorer">

         <span class="minute">
           49'
         </span>


       <a href="/players/serge--gnabry/213651/">
         S. Gnabry
       </a>

       

         <span class="assist">
           (assist by
           <a href="/players/james-david-rodriguez/72408/">J. Rodríguez</a>)
         </span>
  </span>
   <div class="clearfix"></div>
 </li>
 <li class="  ">
   <span class="scorer">
       <a href="/players/davie-selke/213931/">
         D. Selke
       </a>

       

         <span class="minute">
           67'
         </span>

       


   </span>

   <span class="score">2 - 2</span>

   <span class="scorer">
  </span>
   <div class="clearfix"></div>
 </li>
 <li class="  ">
   <span class="scorer">
   </span>

   <span class="score">2 - 3</span>

   <span class="scorer">

         <span class="minute">
           98'
         </span>


       <a href="/players/kingsley-coman/265385/">
         K. Coman
       </a>

       

         <span class="assist">
           (assist by
           <a href="/players/robert-lewandowski/41310/">R. Lewandowski</a>)
         </span>
  </span>
   <div class="clearfix"></div>
 </li>
</ul>

The values that are collected by the patches if i need are:

span.minute span.score
3' 1 - 0
7' 1 - 1
49' 1 - 2
67' 2 - 2
98' 2 - 3

Every span.minute contains a span.score and I want to find the last span.score that span.minute is less than or equal to 90. In this example, the last one is 672-2

My Code:

function score() {
  var ss = SpreadsheetApp.getActive().getSheetByName('copy');
  var response = UrlFetchApp.fetch('URL URL URL URL', {muteHttpExceptions: true});
  if (response.getResponseCode() == 404) {
  } else {
    var contentText = response.getContentText();
    var $ = Cheerio.load(contentText);
    
    var list_minutes = [];
    var list_score = [];

    var minute_goal = $('ul.scorer-info > li > span.scorer > span.minute');
    var score_goal = $('ul.scorer-info > li > span.score');

    minute_goal.each((index, element) => {list_minutes.push([($(element).text().trim()).substring(0, ($(element).text().trim()).indexOf("'"))]);});
    score_goal.each((index, element) => {list_score.push([($(element).text().trim()).replace(/ /g,'')]);});

    var before_90 = '0-0';

    var i=0;
    var max = list_minutes.length
    for(i; i<max; i++){
      if (list_minutes[i][0] <= 90) {
        before_90 = list_score[i][0];
      }
    }
    Logger.log(before_90)
  }
}

Output Logger.log:

2-2

As seen, this way I need to create several lists until I can in a last loop to find the last value of the list before the 90'.

Is there an improved way to reduce this amount of tasks and at least generate a single loop between paths to reduce the code size and its execution time?


Solution

  • In order to retrieve your expected value, how about the following modification?

    From

    var list_minutes = [];
    var list_score = [];
    
    var minute_goal = $('ul.scorer-info > li > span.scorer > span.minute');
    var score_goal = $('ul.scorer-info > li > span.score');
    
    minute_goal.each((index, element) => {list_minutes.push([($(element).text().trim()).substring(0, ($(element).text().trim()).indexOf("'"))]);});
    score_goal.each((index, element) => {list_score.push([($(element).text().trim()).replace(/ /g,'')]);});
    
    var before_90 = '0-0';
    
    var i=0;
    var max = list_minutes.length
    for(i; i<max; i++){
      if (list_minutes[i][0] <= 90) {
        before_90 = list_score[i][0];
      }
    }
    Logger.log(before_90)
    

    To:

    var minute_goal = $('ul.scorer-info > li > span.scorer > span.minute').toArray();
    var score_goal = $('ul.scorer-info > li > span.score').toArray();
    var res = minute_goal.reduce((ar, e, i) => {
      var n = parseInt($(e).text(), 10);
      if (n <= 90) ar.push([n, $(score_goal[i]).text().trim().replace(/ /g, '')]);
      return ar;
    }, []).pop();
    console.log(res) // [ 67, '2-2' ]
    
    • When this modified script is run for your showing HTML data, you can see the value of [ 67, '2-2' ] at the log.