I have three dataframes like below:
df3 <- data.frame(col1=c('A','C','E'),col2=c(4,8,2))
df2 <- data.frame(col1=c('A','B','C','E','I'),col2=c(4,6,8,2,9))
df1 <- data.frame(col1=c('A','D','C','E','I'),col2=c(4,7,8,2,9))
The differences between any two files could be as below:
anti_join(df2, df3)
# Joining, by = c("col1", "col2")
# col1 col2
# 1 B 6
# 2 I 9
anti_join(df3, df2)
# Joining, by = c("col1", "col2")
# [1] col1 col2
# <0 rows> (or 0-length row.names)
anti_join(df1, df2)
# Joining, by = c("col1", "col2")
# col1 col2
# 1 D 7
anti_join(df2, df1)
# Joining, by = c("col1", "col2")
# col1 col2
# 1 B 6
I would like to create a master dataframe with all the values in col1
and col2
specific to each dataframe. If there is no such value present, it should populate NA
.
col1 df1_col2 df2_col2 df3_col2
1 A 4 4 4
2 B NA 6 NA
3 C 8 8 8
4 E 2 2 2
5 I 9 9 NA
6 D 7 NA NA
The essence of the above output could be established from the above anti_join
commands. However, it does not provide the complete picture at once. Any thoughts on how to achieve this?
Edit: For multiple values in col2
for col1
, the output is a little messier. For example, A
has values 4
, 3
.
df3 <- data.frame(col1=c('A','C','E'),col2=c(4,8,2))
df2 <- data.frame(col1=c('A','A','B','C','E','I'),col2=c(4,3,6,8,2,9))
df1 <- data.frame(col1=c('A','A','D','C','E','I'),col2=c(4,3,7,8,2,9))
lst_of_frames <- list(df1 = df1, df2 = df2, df3 = df3)
lst_of_frames %>%
imap(~ rename_at(.x, -1, function(z) paste(.y, z, sep = "_"))) %>%
reduce(full_join, by = "col1")
It gives the below output.
# col1 df1_col2 df2_col2 df3_col2
# 1 A 4 4 4
# 2 A 4 3 4
# 3 A 3 4 4
# 4 A 3 3 4
# 5 D 7 NA NA
# 6 C 8 8 8
# 7 E 2 2 2
# 8 I 9 9 NA
# 9 B NA 6 NA
The interesting part of the output is:
# col1 df1_col2 df2_col2 df3_col2
# 1 A 4 4 4
# 2 A 4 3 4
# 3 A 3 4 4
# 4 A 3 3 4
whereas the expected output is:
# col1 df1_col2 df2_col2 df3_col2
# 1 A 4 4 4
# 2 A 3 3 NA
Similar to @tamtam's answer, but a little programmatic if you have a dynamic list of frames.
lst_of_frames <- list(df1 = df1, df2 = df2, df3 = df3)
# lst_of_frames <- tibble::lst(df1, df2, df3) # thanks, @user63230
library(dplyr)
library(purrr) # imap, reduce
lst_of_frames %>%
imap(~ rename_at(.x, -1, function(z) paste(.y, z, sep = "_"))) %>%
reduce(full_join, by = "col1")
# col1 df1_col2 df2_col2 df3_col2
# 1 A 4 4 4
# 2 D 7 NA NA
# 3 C 8 8 8
# 4 E 2 2 2
# 5 I 9 9 NA
# 6 B NA 6 NA
It's important (for automatically renaming the columns) that the list-of-frames be a named list; my assumption was the name of the frame variable list(df1=df1)
, but it could just as easily be list(A=df1)
to produce a column named A_col2
in the end.