I am trying to scrape a single table from the following URL: https://baseballsavant.mlb.com/league?season=2023#statcastHitting. However, my attempts either scrape multiple tables on the broader page or else I get an output tibble of 0x0. I simply want to scrape the "Statcast Hitting" table near the top of the page, sans spanner titles. I used Selector Gadget to try and pinpoint the correct nodes however I suspect I am not referencing the html properly. An image of what I'm trying to scrape and convert to a dataframe is shown below.
I have tried several times, with a couple of failed attempts below.
library(rvest)
library(tidyverse)
url <- 'https://baseballsavant.mlb.com/league?season=2023#statcastHitting'
savant_teams <- url %>% read_html %>% html_node('#statcastHitting') %>%
html_table()
savant_teams
library(tidyverse)
library(rvest)
url <- 'https://baseballsavant.mlb.com/league?season=2023#statcastHitting'
savant_teams <- url %>% read_html %>% html_node('#statcast_th-8 .tablesorter-header-inner , #statcast_th-7 , #statcast_th-6 .tablesorter-header-inner , #statcast_th-5 .tablesorter-header-inner , #statcast_th-10 .tablesorter-header-inner , #statcast_th-4 .tablesorter-header-inner , #statcast_th-2 .tablesorter-header-inner , #statcast_th-9 , #statcast_th-1 .tablesorter-header-inner , #statcast_th-3 .tablesorter-header-inner , .tablesorterb23e763259572 #statcast_th-0 .tablesorter-header-inner , #scg_ span') %>%
html_table()
savant_teams
The tables are wrapped in a div
with class table-savant
. And as the table you want to scrape is the first you could use the selector #statcastHitting div.table-savant
to select the first div
to get only the table inside that div
:
library(rvest)
library(tidyverse)
url <- 'https://baseballsavant.mlb.com/league?season=2023#statcastHitting'
savant_teams <- url %>%
read_html() %>%
html_node('#statcastHitting div.table-savant') %>%
html_table()
savant_teams
#> # A tibble: 32 × 27
#> `` `` `Standard Stats` `Standard Stats` `Standard Stats`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 "Team" Season PA AB H
#> 2 "" 2023 6,249 5,597 1,543
#> 3 "" 2023 5,985 5,428 1,325
#> 4 "" 2023 6,207 5,541 1,417
#> 5 "" 2023 5,980 5,501 1,308
#> 6 "" 2023 6,253 5,567 1,441
#> 7 "" 2023 6,219 5,489 1,336
#> 8 "" 2023 5,966 5,311 1,187
#> 9 "" 2023 6,164 5,511 1,432
#> 10 "" 2023 6,180 5,401 1,316
#> # ℹ 22 more rows
#> # ℹ 22 more variables: `Standard Stats` <chr>, `Standard Stats` <chr>,
#> # `Standard Stats` <chr>, `Standard Stats` <chr>, `Standard Stats` <chr>,
#> # `Standard Stats` <chr>, `Standard Stats` <chr>, `Standard Stats` <chr>,
#> # `Standard Stats` <chr>, `Standard Stats` <chr>, Statcast <chr>,
#> # Statcast <chr>, Statcast <chr>, Statcast <chr>, Statcast <chr>,
#> # Statcast <chr>, Statcast <chr>, Statcast <chr>, Statcast <chr>, …