Search code examples
rcsvmerger-rownames

How can I merge many data frames from csv files when the ID column is implied?


I'd like to merge a bunch of data frames together (because it seems many operations are easier if you're only dealing w/ one, but correct me if I'm wrong).

Currently I have one data frame like this:

ID, var1, var2
A,  2,    2
B,  4,    5
.
.
Z,  3,    2

Each ID is on a single row w/ several single measurements

I also have a csv file w/ repeated measurement for each ID, like:

filename = ID_B.csv

time, var4, var5
0,    1,    2
1,    4,    5
2,    1,    6
...

What I'd like is:

ID, time, va1, var2, var4, var5
...
B,  0,    4,   5,    1,    2,
B,  1,    4,   5,    4,    5,
B,  2,    4,   5,    1,    6,
...

I don't really care about the column order. The only solution I can think of is to add the ID column to each csv file then loop through them calling merge() several times. Is there a more elegant approach?


Solution

  • My understanding is that you need to extract the ID from the filename, and then merge the imported csv with the existing dataframe.

    df1 <- read.csv(textConnection("ID, var1, var2
    A,  2,    2
    B,  4,    5"))
    
    # assuming the imported csv-files are in working directory
    filenames <- list.files(getwd(), pattern = "ID_[A-Z].csv")
    
    # extract ID from filename
    ids <- gsub("ID_([A-Z]).csv", "\\1", filenames)
    
    # import csv-files and append ID
    library(plyr)
    import <- mdply(filenames, read.csv)
    import$ID <- ids[import$Var1]
    import$Var1 <- NULL
    
    # merge imported csv-files and the existing dataframe
    merge(df1, import)  
    

    Result:

    ID var1 var2 time var4 var5
    1  B    4    5    0    1    2
    2  B    4    5    1    4    5
    3  B    4    5    2    1    6