I'm trying to scrape what appears to be a javascript table and when I use the code below it's returning a dataframe that has the headers but nothing in the body:
library(rvest)
library(tidyverse)
fa_link <- "https://overthecap.com/free-agency"
fa_table <- fa_link %>%
read_html() %>%
html_element("table") %>%
html_table()
I need to scrape this table on a remote server, so I don't think using RSelenium
(or a comparable solution) is possible. Is there a way in RVest to get the table's data?
There's a Ajax call to fetch actual table content, once you have identified it in network tab of your browser's dev tools (use the search on some keywords from that table, i.e. "Tom Brady"), you could mimic it with httr
/httr2
, for example. When missing HTML table pieces are added, it can be parsed with rvest::html_table()
:
library(httr2)
library(rvest)
library(stringr)
# make a POST request with action=get_free_agents&season=2023 in form data
resp <- request("https://overthecap.com/wp-admin/admin-ajax.php") %>%
req_body_form(
action = "get_free_agents",
season = 2023) %>%
req_perform()
# check response content
resp %>% resp_body_string() %>% str_trunc(80)
#> [1] "\t\t\t\t<tr class=\"sortable\" data-old-team=\"TB\" data-new-team=\"\" data-position=\"Q..."
# response includes table rows
# fit those into table template from https://overthecap.com/free-agency source,
# "{.}" in "<tbody>{.}</tbody>" corresponds to resp_body_string() output
resp %>% resp_body_string() %>%
str_glue(
'<table class="controls-table" id="table2023" cellspacing="0" align="center">
<thead>
<tr>
<th class="sortable">Player</th>
<th class="sortable sorttable_numeric">Pos.</th>
<th class="sortable">2022 Team</th>
<th class="sortable">2023 Team</th>
<th class="sortable">Type</th>
<th class="sortable">Snaps</th>
<th class="sortable">Age</th>
<th class="sortable">Current APY</th>
<th class="sortable mobile_drop">Guarantees</th>
</tr>
</thead>
<tbody>{.}</tbody>') %>%
# turn it into valid html
minimal_html() %>%
html_element("table") %>%
html_table()
Result:
#> # A tibble: 838 × 9
#> Player Pos. `2022 Team` 2023 Te…¹ Type Snaps Age Curre…² Guara…³
#> <chr> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 Tom Brady QB Buccaneers "" Void 98.0% 46 $25,00… $25,00…
#> 2 Michael Thomas WR Saints "" Void 12.9% 30 $19,25… $35,64…
#> 3 Orlando Brown LT Chiefs "" UFA 98.4% 27 $16,66… $16,66…
#> 4 Baker Mayfield QB Rams "" UFA 63.7% 28 $15,35… $15,35…
#> 5 Deion Jones LB Browns "" UFA 38.8% 29 $14,25… $18,80…
#> 6 Marcus Peters CB Ravens "" UFA 73.2% 30 $14,00… $21,00…
#> 7 Fletcher Cox IDL Eagles "" Void 64.5% 33 $14,00… $14,00…
#> 8 Robert Quinn EDGE Eagles "" Void 35.9% 33 $14,00… $30,00…
#> 9 Javon Hargrave IDL Eagles "" Void 64.4% 30 $13,00… $26,00…
#> 10 Yannick Ngakoue EDGE Colts "" Void 64.3% 28 $13,00… $21,00…
#> # … with 828 more rows, and abbreviated variable names ¹`2023 Team`,
#> # ²`Current APY`, ³Guarantees
Created on 2023-02-11 with reprex v2.0.2