Search code examples
google-apps-scriptgoogle-sheetsgoogle-sheets-formulacustom-function

Creating a UrlFetchApp script to replace the Google Sheet importHTML function


I used the following formula for about a year now and suddenly it stopped working/importing the table.

=IMPORTHTML("https://tradingeconomics.com/matrix";"table";1)

It gives me a "Could not fetch url: https://tradingeconomics.com/matrix" error.

I tried various things and one of the interesting findings was that the importHTML works for the cached version, but only in a new sheet under a different Google account. Furthermore, the cached version breaks randomly too.

Thus, it seems I won't get around using a script for this purpose.

Ideally, this script would be flexible enough, where it would have a dedicated function e.g. importHTMLtable where the user can input the URL and the table no. and it works. So it would work for the following functions I currently use e.g.

=importHTMLtable("https://tradingeconomics.com/matrix";"table";1)

OR

=importHTMLtable("https://tradingeconomics.com/country-list/business-confidence?continent=world";"table";1)

OR

=importHTMLtable("https://tradingeconomics.com/country-list/ease-of-doing-business";"table";1)

etc...

Not sure if this Github code solves this problem. It seems to only parse text?

As I would assume this is a fairly common problem users of Google Sheets have and would think there might already be an AppScript out there that does exactly this and might be faster in terms of importing speed too.

I can't program, so I tried copying and posting codes to see if I can get some code to work. No luck :(

Can anyone provide a code or maybe an existing app script (I'm not aware of) that does exactly this that?


Solution

  • Try this way

    =importTableHTML(A1,1)
    

    with

    function importTableHTML(url,n){
      var html = UrlFetchApp.fetch(url,{followRedirects : true,muteHttpExceptions: true}).getContentText().replace(/(\r\n|\n|\r|\t|  )/gm,"")
      const tables = [...html.matchAll(/<table[\s\S\w]+?<\/table>/g)];
      var trs = [...tables[n-1][0].matchAll(/<tr[\s\S\w]+?<\/tr>/g)];
      var data = [];
      for (var i=0;i<trs.length;i++){
    console.log(trs[i][0])
        var tds = [...trs[i][0].matchAll(/<(td|th)[\s\S\w]+?<\/(td|th)>/g)];
        var prov = [];
        for (var j=0;j<tds.length;j++){
          donnee=tds[j][0].match(/(?<=\>).*(?=\<\/)/g)[0];
          prov.push(stripTags(donnee));
        }
        data.push(prov);
      }
      return(data)
    }
    function stripTags(body) {
      var regex = /(<([^>]+)>)/ig;
      return body.replace(regex,"").replace(/&nbsp;/g,' ').trim();
    }
    

    enter image description here

    url-fetch-app#advanced-parameters

    matchAll