Search code examples
google-apps-scriptgoogle-sheetsweb-scrapingurlfetch

How to Find find number of tables and importing last table data to googlesheets


I have a situation, where I have dynamic HTML pages, and number of tables are varies according to data. Now I am using the Google Sheets IMPORTHTML function to import the data, but every time I need to count number tables and input index.

I can use the below code to get tables count manually inside browser console log.

var i = 1; [].forEach.call(document.getElementsByTagName("table"),
   function(x) { console.log(i++, x); });

Please help me to get the last(or max) table index every time I input the html page url code.

function doGet() {
  var html = UrlFetchApp.fetch('http://allqs.saqa.org.za/showUnitStandard.php?id=7743').getContentText();
  var table = getElementsByClassName(html, 'table')[0];

  var i = 1; [].forEach.call(document.getElementsByTagName("table"),
       (i++, x);
    console.log (i)

I need to have a simple Apps Script function to get the table index to print in Google Sheets.


Solution

  • From I need to have a simple appscript function to get the table index to print in googlesheet., I understood your goal is as follows.

    • You want to retrieve the number of tables in the HTML of your URL of http://allqs.saqa.org.za/showUnitStandard.php?id=7743 using Google Apps Script.

    In this case, how about the following sample script?

    Sample script:

    Please copy and paste the following script to the script editor of Spreadsheet, and save the script. When you put a custom function of =SAMPLE("http://allqs.saqa.org.za/showUnitStandard.php?id=7743"), the script is run and the number of tables is returned.

    function SAMPLE(url) {
      const html = UrlFetchApp.fetch(url).getContentText();
      const res = [...html.matchAll(/<table/g)]; // or var res = [...html.matchAll(/<table.*>[\s\S\w]*?<\/table>/g)];
      return res.length;
    }
    
    • When your URL of http://allqs.saqa.org.za/showUnitStandard.php?id=7743 is used, 96 is returned.

    • As another method, when split is used, the following sample script might be able to be also used.

      function SAMPLE(url) {
        return UrlFetchApp.fetch(url).getContentText().split("<table").length - 1;
      }
      

    Reference: