I apologize if this question is elementary, but I've been scouring the internet and I can't seem to find a simple solution.
I currently have a list of R objects (named vectors or dataframes of 1 variable, I can work with either), and I want to join them into 1 large dataframe with 1 row for each unique name/rowname, and 1 column for each element in the original list.
My starting list looks something like:
l1 <- list(df1 = data.frame(c(1,2,3), row.names = c("A", "B", "C")),
df2 = data.frame(c(2,6), row.names = c("B", "D")),
df3 = data.frame(c(3,6,9), row.names = c("C", "D", "A")),
df4 = data.frame(c(4,12), row.names = c("A", "E")))
And I want the output to look like:
data.frame("df1" = c(1,2,3,NA,NA),
+ "df2" = c(NA,2,NA,6,NA),
+ "df3" = c(9,NA,3,6,NA),
+ "df4" = c(4,NA,NA,NA,12), row.names = c("A", "B", "C", "D", "E"))
df1 df2 df3 df4
A 1 NA 9 4
B 2 2 NA NA
C 3 NA 3 NA
D NA 6 6 NA
E NA NA NA 12
I don't mind if the fill values are NA or 0 (ultimately I want 0 but that's an easy fix).
I'm almost positive that plyr::cbind.fill
does exactly this, but I have been using dplyr in the rest of my script and I don't think using both is a good idea. dplyr::bind_cols
does not seem to work with vectors of different lengths. I'm aware a very similar question has been asked here: R: Is there a good replacement for plyr::rbind.fill in dplyr?
but as I mentioned, this solution doesn't actually seem to work. Neither does dplyr::full_join
, even wrapped in a do.call
. Is there a straightforward solution to this, or is the only solution to write a custom function?
Here's a way with some purrr
and dplyr
functions. Create column names to represent each data frame—since each has only one column, this is easy with setNames
, but with more columns you could use dplyr::rename
. Do a full-join across the whole list based on the original row names, and fill NA
s with 0.
library(dplyr)
library(purrr)
l1 %>%
imap(~setNames(.x, .y)) %>%
map(tibble::rownames_to_column) %>%
reduce(full_join, by = "rowname") %>%
mutate_all(tidyr::replace_na, 0)
#> rowname df1 df2 df3 df4
#> 1 A 1 0 9 4
#> 2 B 2 2 0 0
#> 3 C 3 0 3 0
#> 4 D 0 6 6 0
#> 5 E 0 0 0 12