I am trying to scrape two tables from the npb.jp website using the rvest package in R. I have tried using CSS selectors for the 2 tables but to no avail. Could the issue lie in the format of the webpage?
Code:
html <- read_html("https://npb.jp/bis/eng/2022/stats/std_c.html")
css <- "#stdivmaintbl > table > tbody > tr > td > div:nth-child(1)"
nodes <- html_nodes(html, css)
table <- html_table(nodes)[[1]]
df <- data.frame(table)
The code is reading in the html but cannot seem to find the table.
Appreciate any assistance.
For whatever reason when I tried to directly read the url I got an error about a certificate, so I copied and pasted the source html into a file instead of reading it in using the URL. I'm assuming what I read in from file should still be the same as what you read in from the internet. This worked for me:
library(rvest)
library(magrittr)
# this is where I saved the page's html
# assuming you don't have the same certificate problem I had,
# you could use this instead: url <- "https://npb.jp/bis/eng/2022/stats/std_c.html"
url <- "baseball.html"
table <- url %>% read_html() %>% html_nodes(".stdtblmain") %>% html_table()
table[[1]]
> table[[1]]
# A tibble: 27 × 239
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 "TeamGWL… "Tea… G W L T PCT "GB" "" Home Road "" "vsS" vsDB vsT vsG vsC vsD Int Toky…
2 "Team" "G" W L T PCT GB "" "" Home Road "" "" vsS vsDB vsT vsG vsC vsD Int
3 "Tokyo Y… "" Toky… 143 80 59 4 "" "" .576 -- "" "" 37-34 43-2… *** 16-9 13-1… 11-1… 16-8…
4 "" "Tok… NA NA NA NA NA "" "" NA NA "" "" NA NA NA NA NA NA NA
5 "YOKOHAM… "" YOKO… 143 73 68 2 "" "" .518 8.0 "" "" 41-3… 32-3… 9-16 *** 16-9 13-1… 8-17
6 "" "YOK… NA NA NA NA NA "" "" NA NA "" "" NA NA NA NA NA NA NA
7 "Hanshin… "" Hans… 143 68 71 4 "" "" .489 12.0 "" "" 37-3… 31-3… 11-1… 9-16 *** 14-1… 9-14…
8 "" "Han… NA NA NA NA NA "" NA NA NA NA "" NA NA NA NA NA NA NA
9 "Yomiuri… "" Yomi… 143 68 72 3 ".48… "12.… 35-3… 33-3… "13-… "11-… 10-1… *** 13-12 13-12 8-10 NA NA
10 "" "Yom… NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
# … with 17 more rows, and 219 more variables: X21 <chr>, X22 <chr>,