Search code examples
rmergesemi-join

R merge multiple files into 1 dataframe and keep all values


I have 6 multiple files, each having only 1 column of names. I want to read all these files and combine them in 1 dataframe so that it looks like this:

file1  file2  file3  file4  file5  file6
adam   adam   adam   adam   adam   adam
Roy    NA     Roy    Roy    NA     NA
NA     Sam    Sam    NA     NA     NA

The colnames of resulting dataframe should represent the actual file names. Let's say that the file I read were named file1.txt, file2.txt and so on..

Any help will be much appreciated.

What I've been trying so far:

multmerge = function(mypath){
+ filenames=list.files(path=mypath, full.names=TRUE, pattern = "\\.txt$")
+ datalist = lapply(filenames, function(x){read.csv(file=x,header=F)})
+ Reduce(function(x,y) {merge(x,y, all.x=T)}, datalist)}
> mymergeddata = multmerge("/Path/To/The/Folder/Having/All/Files")
> dim(mymergeddata)
[1] 11508     1

As seen here it is combining all values (names) in one column..


Solution

  • Here's one way to do that with rbindlist and dcast from data.table

    library(data.table)
    file_list <- list.files("c:/temp/files/",full.names = TRUE)
    import_files <- lapply(file_list,read.csv,stringsAsFactors =FALSE)
    rbinded_files <- na.omit(rbindlist(import_files,idcol="file"))
    dcast(rbinded_files,file1 ~file,fun=max, na.rm=TRUE)
    
      file1    1    2    3    4    5    6
    1  adam adam adam adam adam adam adam
    2   Roy  Roy <NA>  Roy  Roy <NA> <NA>
    3   Sam <NA>  Sam  Sam <NA> <NA> <NA>
    

    You can remove the first column if you want.