I have a data frame which contains list inside it.
df <- data.frame(
id=c(1:4),
a=I(list(c(1,"a1"),2,c("a31","a32","a33"),"a4")),
b=I(list(2,c("b1","b2",3),c("b3","b4"),4))
); print(df)
id a b
1 1 1, a1 2
2 2 2 b1, b2, 3
3 3 a31, a32.... b3, b4
4 4 a4 4
Now, I need to unnest
the list to obtain data frame like this:
df2 <- data.frame(
id=c(1,1,2,2,2,3,3,3,3,3,3,4),
a=c(1,"a1",2,2,2,"a31","a31","a32","a32","a33","a33","a4"),
b=c(2,2,"b1","b2",3,"b3","b3","b3","b4","b4","b4",4)
) ; print(df2)
id a b
1 1 1 2
2 1 a1 2
3 2 2 b1
4 2 2 b2
5 2 2 3
6 3 a31 b3
7 3 a31 b3
8 3 a32 b3
9 3 a32 b4
10 3 a33 b4
11 3 a33 b4
12 4 a4 4
I used to use unnest()
for those which contains the same number of list elements in some rows/columns, but current data frame contains different number of elements in some rows and columns. Currently I am facing the following error.
> target <- c("id","a","b")
> df %>% unnest(cols=target)
Error in `unnest()`:
! In row 3, can't recycle input of size 3 to size 2.
Run `rlang::last_trace()` to see where the error occurred.
Due to unpredictability of where it happens (rowwise/columnwise) and how many elements it will contain, I cannot find appropriate approach to tackle with this. The number of columns and its names cannot be determined in advance.
I appreciate your suggestion especially simple ones which can be integrated into current pipe operation in dplyr
. Base R
and other approaches are also welcomed.
====
Rreproducible example data
Let me share a reproductible example of the actual data frame that I am working on. It did not work with solutions from @ThomasIsCoding. I suspect it caused by NULL but it was not the reason. Blank/Null is not necessarily replicated.
structure(list(cluster = c("1", "2", "3", "4", "5", "6"), st_sub_main_th = list(
"hira", NULL, "tsuma", "tsuma", NULL, c("other", "hira")),
roo_main = list("2", "4", "3", "2", c("1", "3"), c("6", "7",
"2", "1")), st_con_rt = list("sub-room", "main-room", "sub-room",
"sub-room", "main-room", "sub-room"), st_con_tr = list(
"terrace", c("terrace", "direct"), "terrace", "terrace",
"terrace", "direct"), st_adsb = list("add", "add", "add",
"sub", "add", "sub"), st_th = list(NULL, "tsuma", NULL,
NULL, "hira", NULL), st_sub2_main_th = list(NULL, NULL,
NULL, "hira", "hira", "tsuma"), isstilt = list(NULL,
NULL, NULL, NULL, NULL, "0")), class = "data.frame", row.names = c(NA,
-6L))
Use tidyverse:
library(tidyverse)
exec(pmap_df, df, expand_grid)
# A tibble: 15 × 9
cluster st_sub_main_th roo_main st_con_rt st_con_tr st_adsb st_th st_sub2_main_th isstilt
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 hira 2 sub-room terrace add NA NA NA
2 2 NA 4 main-room terrace add tsuma NA NA
3 2 NA 4 main-room direct add tsuma NA NA
4 3 tsuma 3 sub-room terrace add NA NA NA
5 4 tsuma 2 sub-room terrace sub NA hira NA
6 5 NA 1 main-room terrace add hira hira NA
7 5 NA 3 main-room terrace add hira hira NA
8 6 other 6 sub-room direct sub NA tsuma 0
9 6 other 7 sub-room direct sub NA tsuma 0
10 6 other 2 sub-room direct sub NA tsuma 0
11 6 other 1 sub-room direct sub NA tsuma 0
12 6 hira 6 sub-room direct sub NA tsuma 0
13 6 hira 7 sub-room direct sub NA tsuma 0
14 6 hira 2 sub-room direct sub NA tsuma 0
15 6 hira 1 sub-room direct sub NA tsuma 0
Note that the above is equivalent to:
bind_rows(exec(pmap, df, expand_grid))
In case you have mixed datatypes, consider the following:
exec(rbind, !!!exec(pmap, df, expand_grid))