I'm trying to scrape the various tables from this webpage: https://www.pro-football-reference.com/years/2020/
When inspecting the elements of the page, I found it easy to obtain the first two tables by using the following code:
### packages
library(tidyverse)
library(rvest)
### Scrape offense
url_off <- read_html("https://www.pro-football-reference.com/years/2020/")
## AFC Standings
url_off %>%
html_table(fill = TRUE) %>%
.[1] %>%
as.data.frame()
## NFC Standings
url_off %>%
html_table(fill = TRUE) %>%
.[2] %>%
as.data.frame()
Where I am stuck is every other table on that page.
For example, the offense table, I can see where it is on the page:
I've tried a few ways of extracting it without any luck. For example:
url_off %>%
html_nodes(".table_outer_container") %>%
html_nodes("#team_stats")
url_off %>%
html_nodes(".table_wrapper") %>%
html_nodes("#team_stats")
This seems to be an issue when I try and extract any of the other tables from that page. The only two tables I can get are the first two (above). I can't figure out where I am going wrong.
I've sorted it out. The data is all stored as a comment, which I think was my issue. Here is how I've extracted the tables, for anyone interested or having similar issues:
url_off %>%
html_nodes('#all_team_stats') %>%
html_nodes(xpath = 'comment()') %>%
html_text() %>%
read_html() %>%
html_node('table') %>%
html_table()
url_off %>%
html_nodes('#all_passing') %>%
html_nodes(xpath = 'comment()') %>%
html_text() %>%
read_html() %>%
html_node('table') %>%
html_table()