Search code examples
google-sheets

regexreplace multiple columns for football data


I've been trying without success to import using google sheets to scrape the various team table data from the url's https://www.pro-football-reference.com/years/2024/ and https://www.pro-football-reference.com/years/2024/opp.htm

I found this code from another post.

ImportXML XPath issue using Google Sheets on a simple web scraping query

It is so close to my need but I've been unable adapt it to similar data from the same website.

The query portion pulls the correct unparsed data.

QUERY(IMPORTDATA("https://www.pro-football-reference.com/years/2024/");"select Col1 where Col1 contains 'exp_pts_rush'")

So I believe my misunderstanding of REGEXREPLACE is the issue.

Any help is appreciated

=ARRAYFORMULA(SPLIT(QUERY(ARRAYFORMULA(REGEXREPLACE(ARRAYFORMULA(SUBSTITUTE(QUERY(IMPORTDATA("https://www.pro-football-reference.com/years/2024/");"select Col1 where Col1 contains 'exp_pts_rush'");"></td>";">0</td>"));".+ranker.+?>(.+?)<.+team.+([A-Z]{3}).+g.+?>(.+?)<.+?rush.+?>(.+?)<.+?rush.+?>(.+?)<.+?rush.+?>(.+?)<.+?rush.+?>(.+?)<.+?rush.+?>(.+?)<.+?rush.+?>(.+?)<.+?fum.+?>(.+?)<.+?exp.+";"$1;$2;$3;$4;$5;$6;$7;$8;$9;$10;$11"));"select * WHERE NOT Col1 contains '<'");";"))

I hope to a result like this. I hope to a result like this;


Solution

  • As an alternative, you can get the related data with this script;

    function getTable() {
      var url = "https://www.pro-football-reference.com/years/2024/";
     
      try {
        var options = {
            "method" : "GET",
            'headers': {
              'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.172 Safari/537.36 Edg/117.0.2045.43'
            },
            'muteHttpExceptions': true
          };
    
        var response = UrlFetchApp.fetch(url,options);
        
        if (response.getResponseCode() === 200) {
          var html = response.getContentText();
          html = html.replace(/(\r\n|\n|\r)/gm,"");
          
          var tablesMatch = html.match(/<tbody(.*?)<\/tbody>/gm);
          
          if (tablesMatch && tablesMatch.length > 1) {
            var myTable = tablesMatch[2];
            //Logger.log(myTable);
            var doc = XmlService.parse(myTable); 
          
            var rows = doc.getDescendants().filter(function(c) {
              var element = c.asElement();
              return element && element.getName() == "tr";
            });
    
            var data = rows.slice(0).map(function(row) {
              return row.getChildren("td").map(function(cell) {
                return cell.getValue();
              });
            });
    
            Logger.log(data); 
            
            SpreadsheetApp.getActiveSheet().clearContents()
    
            var headers=[['Tm','G','Att','Yds','TD','Lng','Y/A','Y/G','Fmb','EXP']];
    
            SpreadsheetApp.getActiveSheet().getRange(1,1, headers.length, headers[0].length).setValues(headers).setFontWeight("bold");
    
            SpreadsheetApp.getActiveSheet().getRange(2, 1, data.length, data[0].length).setValues(data);
    
            } else {
              Browser.msgBox("No tables found");
            } 
          } else {
            Browser.msgBox( "Error: Cannot connect to URL . HTTP response code: " + response.getResponseCode());
            } 
          }
          catch (e) {
            Browser.msgBox(e);
            return;   
          }
    }
    

    .

    The below table shows the output on the sheet;

    .

    Tm G Att Yds TD Lng Y/A Y/G Fmb EXP
    Green Bay Packers 2 74 424 1 34 5.7 212 3 2.89
    Los Angeles Chargers 2 71 395 2 61 5.6 197.5 1 3.82
    New Orleans Saints 2 76 370 6 17 4.9 185 1 16.19
    Arizona Cardinals 2 65 355 2 41 5.5 177.5 3 11.85
    New England Patriots 2 75 355 2 45 4.7 177.5 2 5.47
    Washington Commanders 2 65 353 3 40 5.4 176.5 3 11.67
    Baltimore Ravens 2 59 336 2 29 5.7 168 1 14.25
    Philadelphia Eagles 2 75 330 3 34 4.4 165 2 4.62
    Detroit Lions 2 58 302 3 24 5.2 151 0 19.81
    Houston Texans 2 62 288 1 18 4.6 144 1 -7.31
    San Francisco 49ers 2 63 282 3 24 4.5 141 3 1.92
    Pittsburgh Steelers 2 77 278 0 20 3.6 139 2 -13.26
    Tennessee Titans 2 54 270 2 26 5 135 2 5.65
    Minnesota Vikings 2 50 257 1 25 5.1 128.5 4 4.92
    Jacksonville Jaguars 2 47 255 2 33 5.4 127.5 3 -2.01
    Indianapolis Colts 2 40 244 2 29 6.1 122 3 14.3
    Atlanta Falcons 2 50 241 0 19 4.8 120.5 2 -4.74
    Buffalo Bills 2 59 238 4 49 4 119 2 0.33
    Kansas City Chiefs 2 52 221 2 21 4.3 110.5 1 4.23
    Miami Dolphins 2 59 220 1 17 3.7 110 0 -6.94
    Cleveland Browns 2 48 218 2 36 4.5 109 2 3.86
    New York Giants 2 43 203 1 22 4.7 101.5 2 0.49
    Seattle Seahawks 2 52 192 3 34 3.7 96 1 -5.81
    Tampa Bay Buccaneers 2 53 182 1 31 3.4 91 2 -5.94
    Dallas Cowboys 2 46 170 1 12 3.7 85 2 0.43
    New York Jets 2 43 169 2 30 3.9 84.5 1 0.25
    Denver Broncos 2 44 163 1 23 3.7 81.5 2 -3.18
    Chicago Bears 2 44 155 1 24 3.5 77.5 3 -1.48
    Carolina Panthers 2 38 148 1 23 3.9 74 1 -3.27
    Cincinnati Bengals 2 38 144 1 16 3.8 72 6 3.84
    Los Angeles Rams 2 43 136 2 12 3.2 68 1 -1.84
    Las Vegas Raiders 2 39 98 1 12 2.5 49 3 -21.03