Search code examples
rlistdata-manipulationnested-listsmelt

Converting a nested (3-level) list to a long/tall format data frame


I have a nested list with 3 levels:

m = list(try1 = list(list(court = c("jack", "queen", "king"),
                          suit = list(diamonds = 2, clubs = 5)), 
                     list(court = c("jack", "queen", "king"),
                          suit = list(diamonds = 45, clubs = 67))), 
         try2 = list(list(court = c("jack", "queen", "king"),
                          suit = list(diamonds = 400, clubs = 300)), 
                     list(court = c("jack", "queen", "king"),
                          suit = list(diamonds = 5000, clubs = 6000))))

> str(m)
List of 2
 $ try1:List of 2
  ..$ :List of 2
  .. ..$ court: chr [1:3] "jack" "queen" "king"
  .. ..$ suit :List of 2
  .. .. ..$ diamonds: num 2
  .. .. ..$ clubs   : num 5
  ..$ :List of 2
  .. ..$ court: chr [1:3] "jack" "queen" "king"
  .. ..$ suit :List of 2
  .. .. ..$ diamonds: num 45
  .. .. ..$ clubs   : num 67
 $ try2:List of 2
  ..$ :List of 2
  .. ..$ court: chr [1:3] "jack" "queen" "king"
  .. ..$ suit :List of 2
  .. .. ..$ diamonds: num 400
  .. .. ..$ clubs   : num 300
  ..$ :List of 2
  .. ..$ court: chr [1:3] "jack" "queen" "king"
  .. ..$ suit :List of 2
  .. .. ..$ diamonds: num 5000
  .. .. ..$ clubs   : num 6000

For each sublist in try1 and try2, I need to extract the suit sublist and rbind its elements such that the resulting data frame is in a long format with 4 columns - value (the value of the suit), suit (which identifies which suit the value comes from, i.e. diamonds or clubs), iter (to identify which sublist the suit belongs to, i.e. 1 or 2) and try (try1 or try2).

I could achieve this using a combination of expand.grid() and mapply():

grd = expand.grid(try = names(m), iter = 1:2, suit = c("diamonds", "clubs"))

grd$value = mapply(function(x, y, z) m[[x]][[y]]$suit[[z]], grd[[1]], grd[[2]], grd[[3]])

The result:

> grd
   try iter     suit value
1 try1    1 diamonds     2
2 try2    1 diamonds   400
3 try1    2 diamonds    45
4 try2    2 diamonds  5000
5 try1    1    clubs     5
6 try2    1    clubs   300
7 try1    2    clubs    67
8 try2    2    clubs  6000

However, I was wondering if there was a more general/concise way of reproducing the above result (preferably in base R)? I was thinking about extracting the suit element from each sublist and then using something like stack() recursively on the resulting list:

rapply(m, function(x) setNames(stack(x), names(x)))

But this throws an error, I'm not quite sure why and I don't know what to use in its place.


Solution

  • We could use a combination of map with melt

    library(purrr)
    library(reshape2)
    library(dplyr)
    map_df(m, ~ .x %>%
                     map(pluck, "suit")  %>% 
                       melt, .id = 'try') 
    

    Or with enframe and map

    library(tibble)
    map_df(m, ~ .x %>% 
                  map_df(pluck, "suit") %>% 
                        map_df(~ enframe(.x, name = "iter") %>%
                           unnest, .id = "suit"), .id = 'try'  )
    # A tibble: 8 x 4
    #  try   suit      iter value
    #  <chr> <chr>    <int> <dbl>
    #1 try1  diamonds     1     2
    #2 try1  diamonds     2    45
    #3 try1  clubs        1     5
    #4 try1  clubs        2    67
    #5 try2  diamonds     1   400
    #6 try2  diamonds     2  5000
    #7 try2  clubs        1   300
    #8 try2  clubs        2  6000