Search code examples
rdplyrfull-join

Merging three dataframe results in error: Can't join due to incompatible types (list / integer) in R


I have three dataframes. I would like to merge them over their common variables -- state_num and year.

x=structure(list(x1 = list(NA, NA, NA, 0.0682483691857418, 0.0722250238759167, 
    NA, 0.0845019820471355, 0.0420250458245735, NA, 0.068867373870291), 
    x2 = list(NA, NA, NA, 0.0214527829310878, 0.0116780721193381, 
        NA, 0.0214444646549374, 0.0165359954773674, NA, 0.0247734009402538), 
    state_num = list(1, 2, 4, 5, 6, 8, 9, 10, 11, 12), year = list(
        2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 
        2014L, 2014L)), row.names = c(NA, 10L), class = "data.frame")

y = structure(list(y1 = list(NA, NA, NA, 0, 0.0127529380732448, NA, 
    0.00653696511520211, 0.0233385543680627, NA, 0), y2 = list(
    NA, NA, NA, 0, 0.00484594815594003, NA, 0.00380286437912703, 
    0.0118211932846739, NA, 0), state_num = list(1, 2, 4, 5, 
    6, 8, 9, 10, 11, 12), year = list(2014L, 2014L, 2014L, 2014L, 
    2014L, 2014L, 2014L, 2014L, 2014L, 2014L)), row.names = c(NA, 
10L), class = "data.frame")

z = structure(list(state = c("Alaska", "Alaska", "Alaska", "Alaska", 
"Alaska", "Alaska", "Alaska", "Alaska", "Alaska", "Alabama"), 
    state_num = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), year = c(2014L, 
    2015L, 2016L, 2017L, 2018L, 2019L, 2020L, 2021L, 2022L, 2014L
    ), z1 = 1:10, z2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L), z3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 
10L), class = "data.frame")

I have tried to merge them like this:

result = full_join(x,y) %>% full_join(z)

But I get the following error:

Error in `full_join()`:
! Can't join `x$state_num` with `y$state_num` due to incompatible types.
i `x$state_num` is a <list>.
i `y$state_num` is a <integer>.

Does anyone know what I am doing wrong or better yet, how to fix the problem?


Solution

  • As stated in the error, you have nested lists as columns in x.

    Probably you can try

    Reduce(
      full_join,
      lapply(
        list(x, y, z),
        \(d) list2DF(lapply(d, unlist))
      )
    )
    

    which gives

               x1         x2 state_num year          y1          y2   state z1 z2
    1          NA         NA         1 2014          NA          NA Alabama 10  0
    2          NA         NA         2 2014          NA          NA  Alaska  1  0
    3          NA         NA         4 2014          NA          NA    <NA> NA NA
    4  0.06824837 0.02145278         5 2014 0.000000000 0.000000000    <NA> NA NA
    5  0.07222502 0.01167807         6 2014 0.012752938 0.004845948    <NA> NA NA
    6          NA         NA         8 2014          NA          NA    <NA> NA NA
    7  0.08450198 0.02144446         9 2014 0.006536965 0.003802864    <NA> NA NA
    8  0.04202505 0.01653600        10 2014 0.023338554 0.011821193    <NA> NA NA
    9          NA         NA        11 2014          NA          NA    <NA> NA NA
    10 0.06886737 0.02477340        12 2014 0.000000000 0.000000000    <NA> NA NA
    11         NA         NA         2 2015          NA          NA  Alaska  2  0
    12         NA         NA         2 2016          NA          NA  Alaska  3  0
    13         NA         NA         2 2017          NA          NA  Alaska  4  0
    14         NA         NA         2 2018          NA          NA  Alaska  5  0
    15         NA         NA         2 2019          NA          NA  Alaska  6  0
    16         NA         NA         2 2020          NA          NA  Alaska  7  0
    17         NA         NA         2 2021          NA          NA  Alaska  8  0
    18         NA         NA         2 2022          NA          NA  Alaska  9  0
       z3
    1   0
    2   0
    3  NA
    4  NA
    5  NA
    6  NA
    7  NA
    8  NA
    9  NA
    10 NA
    11  0
    12  0
    13  0
    14  0
    15  0
    16  0
    17  0
    18  0