I'm trying to merge together a list of data frames with the reduce
function and I'm strugging with renaming the '.x' and '.y' endings for duplicate columns names with the names of the dataframes.
dat01_characterization<-data.frame(usubjid = as.factor(sample(10)), col2 = letters[1:10], col3 = letters[1:10])
dat02_consent<-data.frame(usubjid = as.factor(sample(10)), col3 = letters[1:10], col4 = letters[1:10])
dat03_psqi<-data.frame(usubjid = as.factor(sample(10)), col5 = letters[1:10], col3 = letters[1:10])
l2<-mget(ls(pattern="dat0"))
#l2<-list(dat01_characterization,dat02_consent,dat03_psqi)
mergefunction<-function(x,y){
xname<-substr(names(x),regexpr("_",names(x))+1,nchar(names(x)))
yname<-substr(names(y),regexpr("_",names(y))+1,nchar(names(y)))
merged_data<-merge(x,y,by=c("usubjid"),all=TRUE)
colnames(merged_data)<-gsub("\\.x",paste0("\\.",xname),names(merged_data))
colnames(merged_data)<-gsub('\\.y',paste0("\\.",yname),names(merged_data))
return(merged_data)
}
bbb<-Reduce(function(x,y) mergefunction(x,y),l2)
Using names()
on the arguments in the reduce function will give me the columnnames of the dataframes as in using names()
on a list object l2[[1]]
rather than on the higher level object l2[1]
. Any idea on how to access the actual dataframe names (i.e. dat01_characterization, etc.)
+++UPDATE+++
It didn't work with the original reduce function and I had to write my own version with a for loop. Here's how that works:
dat01_characterization2<-data.frame(usubjid = as.factor(sample(10)), col2 = letters[1:10], col3 = letters[1:10])
dat02_consent2<-data.frame(usubjid = as.factor(sample(10)), col3 = letters[1:10], col4 = letters[1:10])
dat03_psqi2<-data.frame(usubjid = as.factor(sample(10)), col5 = letters[1:10], col3 = letters[1:10])
l3<-mget(ls(pattern="dat0"))
out<-l3[[1]]
for(i in 2:length(l3)){
yname<-substr(names(l3[i]),regexpr("_",names(l3[i]))+1,nchar(names(l3[i])))
out<-merge(out,l3[[i]],by=c("usubjid"),all=TRUE)
colnames(out)<-gsub("\\.x","",names(out))
colnames(out)<-gsub('\\.y',paste0("\\.",yname),names(out))
}
dat01_characterization,dat02_consent,dat03_psqi are not data.frame names but names of variable containing the data.frame contents. Once you have evaluated your list into l2, evaluating each variable, the original names are lost. See str(l2)