Search code examples
rjsonweb-scrapingrvest

How to webscrape data from json in R using Rvest?


I am attempting to webscrape the fixture list from this website

https://www.nrl.com/draw/?competition=111&round=1&season=2024

The output should be

Sea eagles, Rabbitohs

Roosters, Broncos

Knights, Raiders etc

I have written up the following code


url <- "https://www.nrl.com/draw/?competition=111&round=1&season=2024"

page <- read_html(url)

contentnodes <- page %>% html_nodes ("div.u-spacing-mt-24.pre-quench") %>% 
  html_attr("q-data") %>% jsonlite::fromJSON()

but I am getting the following error:

lexical error: invalid char in json text NA

Reading online some suggest the data is HTML rather than JSON but I have webscraped a different page on the same website with similar code so not entirely sure what has gone wrong here?


Solution

  • library(tidyverse)
    library(httr2)
    
    "https://www.nrl.com/draw//data?competition=111&season=2024" %>%
      request() %>% 
      req_perform() %>% 
      resp_body_json(simplifyVector = T) %>% 
      pluck("fixtures") %>% 
      unnest(c(homeTeam, awayTeam), names_sep = "_") %>% 
      select(contains("nickName"), 
             contains("odds"))
    
    # A tibble: 8 × 4
      homeTeam_nickName awayTeam_nickName homeTeam_odds awayTeam_odds
      <chr>             <chr>             <chr>         <chr>        
    1 Sea Eagles        Rabbitohs         2.17          1.69         
    2 Roosters          Broncos           2.51          1.53         
    3 Knights           Raiders           1.42          2.87         
    4 Warriors          Sharks            1.60          2.34         
    5 Storm             Panthers          2.24          1.65         
    6 Eels              Bulldogs          1.47          2.70         
    7 Titans            Dragons           1.49          2.64         
    8 Dolphins          Cowboys           2.67          1.48