Search code examples
rdplyrplyrcbind

Replacement of plyr::cbind.fill in dplyr?


I apologize if this question is elementary, but I've been scouring the internet and I can't seem to find a simple solution.

I currently have a list of R objects (named vectors or dataframes of 1 variable, I can work with either), and I want to join them into 1 large dataframe with 1 row for each unique name/rowname, and 1 column for each element in the original list.

My starting list looks something like:

l1 <- list(df1 = data.frame(c(1,2,3), row.names = c("A", "B", "C")), 
       df2 = data.frame(c(2,6), row.names = c("B", "D")),
       df3 = data.frame(c(3,6,9), row.names = c("C", "D", "A")),
       df4 = data.frame(c(4,12), row.names = c("A", "E")))

And I want the output to look like:

data.frame("df1" = c(1,2,3,NA,NA),
+            "df2" = c(NA,2,NA,6,NA),
+            "df3" = c(9,NA,3,6,NA),
+            "df4" = c(4,NA,NA,NA,12), row.names = c("A", "B", "C", "D", "E"))
  df1 df2 df3 df4
A   1  NA   9   4
B   2   2  NA  NA
C   3  NA   3  NA
D  NA   6   6  NA
E  NA  NA  NA  12

I don't mind if the fill values are NA or 0 (ultimately I want 0 but that's an easy fix).

I'm almost positive that plyr::cbind.fill does exactly this, but I have been using dplyr in the rest of my script and I don't think using both is a good idea. dplyr::bind_cols does not seem to work with vectors of different lengths. I'm aware a very similar question has been asked here: R: Is there a good replacement for plyr::rbind.fill in dplyr? but as I mentioned, this solution doesn't actually seem to work. Neither does dplyr::full_join, even wrapped in a do.call. Is there a straightforward solution to this, or is the only solution to write a custom function?


Solution

  • Here's a way with some purrr and dplyr functions. Create column names to represent each data frame—since each has only one column, this is easy with setNames, but with more columns you could use dplyr::rename. Do a full-join across the whole list based on the original row names, and fill NAs with 0.

    library(dplyr)
    library(purrr)
    
    l1 %>%
      imap(~setNames(.x, .y)) %>%
      map(tibble::rownames_to_column) %>%
      reduce(full_join, by = "rowname") %>%
      mutate_all(tidyr::replace_na, 0)
    #>   rowname df1 df2 df3 df4
    #> 1       A   1   0   9   4
    #> 2       B   2   2   0   0
    #> 3       C   3   0   3   0
    #> 4       D   0   6   6   0
    #> 5       E   0   0   0  12