Search code examples
rdplyrtidyrpurrr

Map a list and extract elements to create a data frame in R


This is similar to the question I posted here.

I use the nhlapi package and the nhl_schedule_seasons function to obtain a list of games and their associated info for a specific season.

Using this for season 2023 with the following:

install.packages("nhlapi")
library(nhlapi)

schedule <- nhl_schedule_seasons(2023)

Returns a list and inside I can see the game info:

str(schedule, list.len = 8)

List of 1
 $ :List of 8
  ..$ copyright   : chr "NHL and the NHL Shield are registered trademarks of the National Hockey League. NHL and NHL team marks are the "| __truncated__
  ..$ totalItems  : int 1423
  ..$ totalEvents : int 0
  ..$ totalGames  : int 1423
  ..$ totalMatches: int 0
  ..$ metaData    :List of 1
  .. ..$ timeStamp: chr "20231016_233752"
  ..$ wait        : int 10
  ..$ dates       :'data.frame':    198 obs. of  8 variables:
  .. ..$ date        : chr [1:198] "2023-09-23" "2023-09-24" "2023-09-25" "2023-09-26" ...
  .. ..$ totalItems  : int [1:198] 3 12 9 7 9 6 8 10 4 7 ...
  .. ..$ totalEvents : int [1:198] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ totalGames  : int [1:198] 3 12 9 7 9 6 8 10 4 7 ...
  .. ..$ totalMatches: int [1:198] 0 0 0 0 0 0 0 0 0 0 ...
  .. ..$ games       :List of 198
  .. .. ..$ :'data.frame':  3 obs. of  30 variables:

How can I extract the specific game info from the games list?

I tried to map with the following:

library(purrr)
library(dplyr)
library(tibble)
library(tidyr)

schedule <- nhl_schedule_seasons(2023) |>
  map(list("dates", "games"))

But I can't work out how to use enframe and probably list_rbind to extract all of the info into a data frame?

I can access each of the dataframes like so:

newdf1 <- as.data.frame(schedule[[1]][[1]])
head(newdf1)
      gamePk                              link gameType   season             gameDate
1 2023010001 /api/v1/game/2023010001/feed/live       PR 20232024 2023-09-23T04:05:00Z
2 2023010002 /api/v1/game/2023010002/feed/live       PR 20232024 2023-09-23T19:00:00Z
3 2023010003 /api/v1/game/2023010003/feed/live       PR 20232024 2023-09-24T00:00:00Z

But I can't work out how to get these for each list?

I also tried a loop with the following:

schedule <- nhl_schedule_seasons(2023) |>
  map(list("dates", "games"))

df = data.frame()

for (i in 1:198) {
  
  res = as.data.frame(schedule[[1]][[i]])
    bind_rows()
  
  df = rbind(df, res)
}

But I get this error:

Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

Solution

  • Does this achieve what you want?

    schedule <- nhl_schedule_seasons(2023) |>
      map(list("dates", "games"))
    
    res <- bind_rows(map(schedule, ~ bind_rows(.x)))
    

    If you want to create columns in the new data frame that contains the names of the list the values came from you can add id = "..." to each bind rows call:

    bind_rows(map(schedule, ~ bind_rows(.x, .id = "inner_list")), .id = "outer_list")