Search code examples
rdataframelistdplyrtidyverse

Expanding data frame which contain list in R


I have a data frame which contains list inside it.

df <- data.frame(
      id=c(1:4),
      a=I(list(c(1,"a1"),2,c("a31","a32","a33"),"a4")),
      b=I(list(2,c("b1","b2",3),c("b3","b4"),4))
    ); print(df)

  id            a         b
1  1        1, a1         2
2  2            2 b1, b2, 3
3  3 a31, a32....    b3, b4
4  4           a4         4

Now, I need to unnest the list to obtain data frame like this:

df2 <- data.frame(
      id=c(1,1,2,2,2,3,3,3,3,3,3,4),
      a=c(1,"a1",2,2,2,"a31","a31","a32","a32","a33","a33","a4"),
      b=c(2,2,"b1","b2",3,"b3","b3","b3","b4","b4","b4",4)
    ) ; print(df2)

   id   a  b
1   1   1  2
2   1  a1  2
3   2   2 b1
4   2   2 b2
5   2   2  3
6   3 a31 b3
7   3 a31 b3
8   3 a32 b3
9   3 a32 b4
10  3 a33 b4
11  3 a33 b4
12  4  a4  4

I used to use unnest() for those which contains the same number of list elements in some rows/columns, but current data frame contains different number of elements in some rows and columns. Currently I am facing the following error.

> target <- c("id","a","b")
> df %>% unnest(cols=target)
Error in `unnest()`:
! In row 3, can't recycle input of size 3 to size 2.
Run `rlang::last_trace()` to see where the error occurred.

Due to unpredictability of where it happens (rowwise/columnwise) and how many elements it will contain, I cannot find appropriate approach to tackle with this. The number of columns and its names cannot be determined in advance.

I appreciate your suggestion especially simple ones which can be integrated into current pipe operation in dplyr. Base R and other approaches are also welcomed.

====

Rreproducible example data

Let me share a reproductible example of the actual data frame that I am working on. It did not work with solutions from @ThomasIsCoding. I suspect it caused by NULL but it was not the reason. Blank/Null is not necessarily replicated.

structure(list(cluster = c("1", "2", "3", "4", "5", "6"), st_sub_main_th = list(
    "hira", NULL, "tsuma", "tsuma", NULL, c("other", "hira")), 
    roo_main = list("2", "4", "3", "2", c("1", "3"), c("6", "7", 
    "2", "1")), st_con_rt = list("sub-room", "main-room", "sub-room", 
        "sub-room", "main-room", "sub-room"), st_con_tr = list(
        "terrace", c("terrace", "direct"), "terrace", "terrace", 
        "terrace", "direct"), st_adsb = list("add", "add", "add", 
        "sub", "add", "sub"), st_th = list(NULL, "tsuma", NULL, 
        NULL, "hira", NULL), st_sub2_main_th = list(NULL, NULL, 
        NULL, "hira", "hira", "tsuma"), isstilt = list(NULL, 
        NULL, NULL, NULL, NULL, "0")), class = "data.frame", row.names = c(NA, 
-6L))

Solution

  • Use tidyverse:

    library(tidyverse)
    exec(pmap_df, df, expand_grid)
    
    # A tibble: 15 × 9
       cluster st_sub_main_th roo_main st_con_rt st_con_tr st_adsb st_th st_sub2_main_th isstilt
       <chr>   <chr>          <chr>    <chr>     <chr>     <chr>   <chr> <chr>           <chr>  
     1 1       hira           2        sub-room  terrace   add     NA    NA              NA     
     2 2       NA             4        main-room terrace   add     tsuma NA              NA     
     3 2       NA             4        main-room direct    add     tsuma NA              NA     
     4 3       tsuma          3        sub-room  terrace   add     NA    NA              NA     
     5 4       tsuma          2        sub-room  terrace   sub     NA    hira            NA     
     6 5       NA             1        main-room terrace   add     hira  hira            NA     
     7 5       NA             3        main-room terrace   add     hira  hira            NA     
     8 6       other          6        sub-room  direct    sub     NA    tsuma           0      
     9 6       other          7        sub-room  direct    sub     NA    tsuma           0      
    10 6       other          2        sub-room  direct    sub     NA    tsuma           0      
    11 6       other          1        sub-room  direct    sub     NA    tsuma           0      
    12 6       hira           6        sub-room  direct    sub     NA    tsuma           0      
    13 6       hira           7        sub-room  direct    sub     NA    tsuma           0      
    14 6       hira           2        sub-room  direct    sub     NA    tsuma           0      
    15 6       hira           1        sub-room  direct    sub     NA    tsuma           0  
    

    Note that the above is equivalent to:

    bind_rows(exec(pmap, df, expand_grid))
    

    In case you have mixed datatypes, consider the following:

    exec(rbind, !!!exec(pmap, df, expand_grid))