Search code examples
google-apps-scriptweb-scrapinghtml-tablecheerio

How to pull HTML table data with Cheerio in Google Apps Script?


Inspired by such a convenience of Cheerio from this answer, I'm trying to use it in the following codes. The codes was able to pull any table data by calling class="snapshot-td2", but I'm interested in getting only those in the first table. How can I do that? The URL has two tables having class="snapshot-td2". And it retrieved them in string. How can I get them in array? Thank you for any help!

function test() {
  const url = 'https://finviz.com/quote.ashx?t=AFRM';
  const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
  const page = Cheerio.load(res);

  // The next line returned the tableValue from two tables of having class="snapshot-td2".
  // What should be modified to get the tableValue only from the first table? 
  // The next line returned the tableValue in string.  What should be modified to get them in array?
  var tableValue = page('.snapshot-td2').text();
  console.log(tableValue);
}

enter image description here


Solution

  • I'm not an expert in jQuery by any means so probably my solutions is quite stupid. But it works:

    function test2() {
      const url = 'https://finviz.com/quote.ashx?t=AFRM';
      const res = UrlFetchApp.fetch(url, { muteHttpExceptions: true }).getContentText();
      const $ = Cheerio.load(res);
    
      var data = $('table.snapshot-table2').find('td').toArray().map(x => $(x).text());
    
      var table = []
      for (var i=0; i<data.length; i+=12) {
        row = [];
        for (var j=0; j<12; j++) row.push(data[i+j]);
        table.push(row);
      }
    
      var range = SpreadsheetApp.getActiveSheet().getRange(1,1,table.length,table[0].length);
      range.setValues(table);
    }
    

    enter image description here

    If you want an array (not a table) the data is the array. Every even element of which [0,2,4...] is a name, and every odd element [1,3,5...] is a value.

    You can convert it into 2 columns [[name, value], [name, value]...] pretty easy:

    var table = [];
    for (var i=0; i<data.length; i+=2) table.push(data[i], data[i+1]);
    

    Or into an object {name:value, name:value, name:value...}:

    var obj = {};
    for (var i=0; i<data.length; i+=2) obj[data[i]] = data[i+1]);