Search code examples
rweb-scrapingrvest

Webscraping Pro Football Reference


I am trying to webscrape the Defence Table from the following page: https://www.pro-football-reference.com/boxscores/202402110kan.htm

Note there are multiple tables on this page so you need to scroll down a bit to see the defence table

I have used the following code

url <- "https://www.pro-football-reference.com/boxscores/202402110kan.htm"
table_defence <- url %>% read_html %>%  
          html_node('#div_all_player_defense') %>% 
      html_table()

However I get the following error Error in UseMethod("html_table") : no applicable method for 'html_table' applied to an object of class "xml_missing"

Based on the URL I am not sure if this issue is because the URL has HTM rather than HTML?

I have tried using rvest but open to use another solution if that works.


Solution

  • library(rvest)
    library(stringr)
    
    url <- "https://www.pro-football-reference.com/boxscores/202402110kan.htm"
    
    # Ingest HTML from URL and convert to string.
    html <- as.character(read_html(url))
    
    # Use regular expressions to remove comments
    cleaned <- str_remove_all(html, "(<!--|-->)")
    
    # Ingest HTML from string.
    html <- read_html(cleaned)
    
    html %>%
      html_nodes("div.table_container")
    
    defense <- html %>%
      html_node("#player_defense") %>%
      html_table()
    
    # Transfer first row (actually second <th> row) to column names.
    colnames(defense) <- defense[1,]
    # Drop first row.
    defense <- defense[-1,]
    
    defense
    

    The results look like this:

       Player            Tm    Int   Yds   TD    Lng   PD    Sk    Comb  Solo  Ast   TFL   QBHits FR    Yds   TD    FF   
       <chr>             <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>  <chr> <chr> <chr> <chr>
     1 Ji'Ayir Brown     SFO   1     0     0     0     1     0.0   11    7     4     0     0      0     0     0     0    
     2 Arik Armstead     SFO   0     0     0     0     0     1.0   6     3     3     1     1      0     0     0     0    
     3 Javon Hargrave    SFO   0     0     0     0     0     1.0   6     3     3     1     1      1     0     0     0    
     4 Chase Young       SFO   0     0     0     0     0     1.0   2     1     1     1     2      0     0     0     0    
     5 Fred Warner       SFO   0     0     0     0     0     0.0   13    9     4     0     0      0     0     0     0    
     6 Deommodore Lenoir SFO   0     0     0     0     0     0.0   8     4     4     0     0      0     0     0     1    
     7 Logan Ryan        SFO   0     0     0     0     0     0.0   7     3     4     0     0      0     0     0     1    
     8 Nick Bosa         SFO   0     0     0     0     0     0.0   6     4     2     2     3      0     0     0     0    
     9 Oren Burks        SFO   0     0     0     0     0     0.0   5     3     2     0     0      0     0     0     0    
    10 Tashaun Gipson    SFO   0     0     0     0     0     0.0   5     4     1     0     0      0     0     0     0