On this website, https://www.covers.com/sport/basketball/nba/matchup/290850/props, this is a dynamic page in 2 ways:
A) I am struggling using rvest to try and load each "row of information (below) into a table i) Name 2) Prop value 3) prediction value 4) Best odds 5) Analysis
B) In R, how would you update the "props" selector and thus be able to download the same data from A.
I have been beating my head against my monitor as I can get down to a "node level" but am struggling how I would parse out the info at the lowest level
covers_page <- "https://www.covers.com/sport/basketball/nba/matchup/290850/props"
tmp <- read_html(covers_page)
nodes_1 <- tmp %>% html_elements("div") %>% xml_find_all("//div[contains(@class,'player-props-table-container')]"
B: when checking HTTP requests through the network tab of browser's dev. tools, you should notice that "props" drop-down triggers Ajax calls (e.g. ... /290850/market?propEvent=NBA_GAME_PLAYER_POINTS
), fetching table content for every prop; as rvest
can't run javascript, urls should be crafted from drop-down item values, so first we need a list of those :
library(rvest)
library(dplyr)
library(purrr)
library(stringr)
url_ <- "https://www.covers.com/sport/basketball/nba/matchup/290850/props"
prop_events <-
read_html(url_) |>
html_elements("li[data-event-name]") |>
map(\(elem) list(event = html_attr(elem, "data-event-name"),
descr = html_text(elem))) |>
bind_rows()
prop_events
#> # A tibble: 12 × 2
#> event descr
#> <chr> <chr>
#> 1 NBA_GAME_PLAYER_POINTS Points Scored
#> 2 NBA_GAME_PLAYER_POINTS_REBOUNDS Points and Rebounds
#> 3 NBA_GAME_PLAYER_POINTS_ASSISTS Points and Assists
#> 4 NBA_GAME_PLAYER_3_POINTERS_MADE 3-Pointers Made
#> 5 NBA_GAME_PLAYER_REBOUNDS_ASSISTS Rebounds and Assists
#> 6 NBA_GAME_PLAYER_STEALS_BLOCKS Steals and Blocks
#> 7 NBA_GAME_PLAYER_BLOCKS Total Blocks
#> 8 NBA_GAME_PLAYER_STEALS Total Steals
#> 9 NBA_GAME_PLAYER_REBOUNDS Total Rebounds
#> 10 NBA_GAME_PLAYER_POINTS_REBOUNDS_ASSISTS Total Points, Rebounds, and Assists
#> 11 NBA_GAME_PLAYER_TURNOVERS Total Turnovers
#> 12 NBA_GAME_PLAYER_ASSISTS Total Assists
# url for props Ajax calls
(url_market <- str_replace(url_, "props$", "market?propEvent="))
#> [1] "https://www.covers.com/sport/basketball/nba/matchup/290850/market?propEvent="
A: you'd generally want to be more specific with your CSS selectors than just plain div
. Elements returned from html_element()
/ html_elements()
can be passed to next html_element()
/ html_elements()
calls, meaning that you can first select all articles ( article.player-prop-article
) and then iterate through the element list and extract bits of interest from each individual article.
# fetch content and process rows (player-prop-article), return tibble
parse_prop <- function(event_url){
read_html(event_url) |>
html_elements("article.player-prop-article") |>
map(\(art) list(
name = html_element(art, ".player-headshot-name strong") |> html_text(),
team = html_element(art, ".player-headshot-name > div") |> html_text() |> str_split_i("\r\n", 3) |> str_squish(),
prop = html_element(art, ".player-props-projection-bestOdds-div > div:nth-child(1) strong") |> html_text(),
proj = html_element(art, ".player-props-projection-bestOdds-div > div:nth-child(2) strong") |> html_text(),
odds = html_element(art, ".player-bestOdds-row > a > div > span") |> html_text(),
art = html_element(art, ".player-analysis") |> html_text())) |>
bind_rows()
}
# call parse_prop() on first three propEvents,
props <-
prop_events$event[1:3] |>
set_names() |>
map(\(event) str_c(url_market, event)) |>
map(parse_prop, .progress = TRUE) |>
list_rbind(names_to = "prop_event")
props
#> # A tibble: 39 × 7
#> prop_event name team prop proj odds art
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 NBA_GAME_PLAYER_POINTS Ja Morant PG • Memph… 25.5 22.4 -120 Offe…
#> 2 NBA_GAME_PLAYER_POINTS Jaren Jackson Jr. PF • Memph… 18.5 21.5 -125 Jare…
#> 3 NBA_GAME_PLAYER_POINTS Jonas Valanciunas C • New Or… 15.5 14.2 -114 Out …
#> 4 NBA_GAME_PLAYER_POINTS Vince Williams Jr. SG • Memph… 6.5 8.2 -150 Vinc…
#> 5 NBA_GAME_PLAYER_POINTS CJ McCollum SG • New O… 17.5 19.4 -125 CJ M…
#> 6 NBA_GAME_PLAYER_POINTS Santi Aldama PF • Memph… 8.5 9.5 -110 The …
#> 7 NBA_GAME_PLAYER_POINTS Herbert Jones PF • New O… 9.5 10.2 -110 Herb…
#> 8 NBA_GAME_PLAYER_POINTS Trey Murphy III SF • New O… 12.5 13.5 -130 Amon…
#> 9 NBA_GAME_PLAYER_POINTS Bismack Biyombo C • Memphis 6.5 6 -140 Bism…
#> 10 NBA_GAME_PLAYER_POINTS David Roddy SF • Memph… 7.5 7.9 -106 Davi…
#> # ℹ 29 more rows
Perhaps bit more common approach is to extract column vectors from document / parent element and combine those to data.frame / tibble, something like this:
html <- read_html("https://www.covers.com/sport/basketball/nba/matchup/290850/market?propEvent=NBA_GAME_PLAYER_POINTS")
tibble(
name = html_elements(html, ".player-headshot-name strong") |> html_text(),
prop = html_elements(html, ".player-props-projection-bestOdds-div > div:nth-child(1) strong") |> html_text(),
proj = html_elements(html, ".player-props-projection-bestOdds-div > div:nth-child(2) strong") |> html_text()
)
While it also tends to be faster than iterating over elements, it's somewhat less robust as it only works when there's no chance that input vectors could end up with different lengths.