I am trying to merge3 datasets, which are part of longitudinal study. The variable names are a identical across the waves of the study. What I want to accomplish is to attach a suffix to each variable name to indicate the dataset it belongs to while merging them. I have been able to do that in a roundabout way, but I am thinking there should be an easier way to accomplish that. Below is what I have done so far, just using some fictitious data to illustrate.
## step 1: create three dfs with identical variable names
df1 = data.frame('ID' = 1:10, 'V1' = sample(x=1:5, size=10, replace=T), 'v2' = sample(x=1:5, size=10, replace=T))
df2 = data.frame('ID' = 1:10, 'V1' = sample(x=1:5, size=10, replace=T), 'v2' = sample(x=1:5, size=10, replace=T))
df3 = data.frame('ID' = 1:10, 'V1' = sample(x=1:5, size=10, replace=T), 'v2' = sample(x=1:5, size=10, replace=T))
## step 2: function to rename all columns in a dataset
colsRename <- function(data, nn){
nn = nn
df = data %>%
rename_with(~ paste0(colnames(data), paste0('_'), nn)[which(colnames(data) == .x)],
.cols = colnames(data)) %>%
glimpse()
return(df)
}
## call function to rename columns
w1 = colsRename(df1, 'W1')
w2 = colsRename(df2, 'W2')
w3 = colsRename(df3, 'W3')
# merge the dataframes
dfMerge = merge(w1, w2, by.x = c("ID_W1"), by.y = c("ID_W2"), all = FALSE) %>%
merge(., w3, by.x = c("ID_W1"), by.y = c("ID_W3"), all = FALSE) %>%
glimpse()
The final output for the joined/merged dataframes will look something like below
ID_W1 V1_W1 v2_W1 V1_W2 v2_W2 V1_W3 v2_W3
1 1 3 4 1 5 5 1
2 2 4 2 5 4 5 2
3 3 4 3 2 4 1 3
4 4 5 1 1 2 5 1
5 5 5 1 3 1 5 1
6 6 4 4 3 4 3 5
7 7 2 5 3 2 3 2
8 8 1 1 2 1 2 2
9 9 5 3 2 2 1 3
10 10 5 2 5 3 4 5
You can put the dataframe in a named list, rename the dataframes and use reduce
-
library(dplyr)
library(purrr)
list(W1 = df1, W2 = df2, W3 = df3) %>%
imap(function(x, y) x %>% rename_with(~paste(., y, sep = '_'), -ID)) %>%
reduce(inner_join, by = 'ID')
# ID V1_W1 v2_W1 V1_W2 v2_W2 V1_W3 v2_W3
#1 1 1 1 5 2 5 5
#2 2 3 2 4 4 3 3
#3 3 4 3 5 4 1 2
#4 4 1 4 2 3 4 5
#5 5 3 5 1 1 1 5
#6 6 5 5 1 2 1 3
#7 7 4 3 3 1 3 2
#8 8 2 1 1 2 4 2
#9 9 5 2 5 4 1 2
#10 10 1 5 1 5 3 4
If there are more dataframe and you don't want to name them individually you can collect all the dataframes from the global environment using mget
.
mget(ls(pattern = 'df\\d+')) %>%
imap(function(x, y) x %>% rename_with(~paste(., y, sep = '_'), -ID)) %>%
reduce(inner_join, by = 'ID')