Search code examples
rdataframecbind

cbind data without specifying inputs


I have a piece of code, I want to cbind my data. The catch is I will not always have eighta data files to cbind. I would like to keep the code below and import just five, if I have five. The reason is this. I will will always have between 1 - 100 dataframes to cbind, I dont want always manually tell R to cbind one or 100. I want to just have cbind (1 :100) and always cbind what needs to be cbind.

finaltable<- cbind(onea, twoa, threea, foura, fivea, sixa, sevena, eighta)

Solution

  • Without more data, here's a contrived example. First, I'll make some example files with the same number of rows in each:

    filenames <- paste0(c('onea', 'twoa', 'threea', 'foura'), '.csv')
    for (fn in filenames)
        write.csv(matrix(runif(5), nc = 1), file = fn, row.names = FALSE)
    

    Let's first dynamically derive a list of filenames to process. (This code is assuming that the previous lines making these files did not happen.)

    (filenames <- list.files(pattern = '*.csv'))
    ##  [1] "foura.csv"  "onea.csv"   "threea.csv" "twoa.csv"  
    

    This is the "hard" part, reading the files:

    (ret <- do.call(cbind, lapply(filenames,
                                  function(fn) read.csv(fn, header = TRUE))))
    ##           V1        V1        V1        V1
    ##  1 0.9091705 0.4934781 0.7607488 0.4267438
    ##  2 0.9692987 0.4349523 0.6066990 0.9134305
    ##  3 0.6444404 0.8639983 0.1473830 0.9844336
    ##  4 0.7719652 0.1492200 0.7731319 0.9689941
    ##  5 0.9237107 0.6317367 0.2565866 0.1084299
    

    For proof of concept, here's the same thing but operating on a subset of the vector of filenames, showing that the length of the vector is not a concern:

    (ret <- do.call(cbind, lapply(filenames[1:2],
                                  function(fn) read.csv(fn, header = TRUE))))
    ##           V1        V1
    ##  1 0.9091705 0.4934781
    ##  2 0.9692987 0.4349523
    ##  3 0.6444404 0.8639983
    ##  4 0.7719652 0.1492200
    ##  5 0.9237107 0.6317367
    

    You may want/need to redefine the names of the columns (with names(ret) <- filenames, for example), but you can always reference the columns by numbered indexing (e.g., ret[,2]) without worrying about names.