Search code examples
rcsvdataframecbind

cbind column in several csv files in r


I am new to R and dont know exactly how to do for loops. Here is my problem: I have about 160 csv files in a folder, each with a specific name. In each file, there is a pattern:"HL.X.Y.Z.", where X="Region", Y="cluster", and Z="point". What i need to do is read all these csv files, extract strings from the names, create a column with the strings for each csv file, and bind all these csv files in a single data frame. Here is some code of what i am trying to do:

setwd("C:/Users/worddirect")
files.names<-list.files(getwd(),pattern="*.csv")
files.names 
head(files.names)
>[1] "HL.1.1.1.2F31CA.150722.csv"  "HL.1.1.2.2F316A.150722.csv" 
 [3] "HL.1.1.3.2F3274.150722.csv"  "HL.1.1.4.2F3438.csv"        
 [5] "HL.1.10.1.3062CD.150722.csv" "HL.1.10.2.2F343D.150722.csv"

Doing like this to read all files works just fine:

files.names
    for (i in 1:length(files.names)) {
    assign(files.names[i], read.csv(files.names[i],skip=18))
            }

Adding an extra column for an individual csv files like this works fine:

test<-cbind("Region"=rep(substring(files.names[1],4,4),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
        "Cluster"=rep(substring(files.names[1],6,6),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
        "Point"=rep(substring(files.names[1],8,8),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
        HL.1.1.1.2F31CA.150722.csv)
 head(test)
  Region Cluster Point          Date.Time Unit  Value
1      1       1     1 6/2/14 11:00:01 PM    C 24.111
2      1       1     1  6/3/14 1:30:01 AM    C 21.610
3      1       1     1  6/3/14 4:00:01 AM    C 20.609

However, a for loop of the above doesn`t work.

files.names
    for (i in 1:length(files.names)) {
    assign(files.names[i], read.csv(files.names[i],skip=18))
    cbind("Region"=rep(substring(files.names[i],4,4),times=nrow(i)),
        "Cluster"=rep(substring(files.names[i],6,6),times=nrow(i)),
        "Point"=rep(substring(files.names[i],8,8),times=nrow(i)),
        i)
            }
>Error in rep(substring(files.names[i], 4, 4), times = nrow(i)) : 
  invalid 'times' argument

The final step would be to bind all the csv files in a single data frame.

I appreciate any suggestion. If there is any simpler way to do what i did i appreciate too!


Solution

  • There are many ways to solve a problem in R. A more R-like way to solve this problem is with an apply() function. The apply() family of functions acts like an implied for loop, applying one or more operations to each item in passed to it via a function argument.

    Another important feature of R is the anonymous function. Combining lapply() with an anonymous function we can solve your multi file read problem.

    setwd("C:/Users/worddirect")
    files.names<-list.files(getwd(),pattern="*.csv")
    # read csv files and return them as items in a list()
    theList <- lapply(files.names,function(x){
         theData <- read.csv(x,skip=18)
         # bind the region, cluster, and point data and return
         cbind(
              "Region"=rep(substring(x,4,4),times=nrow(theData)),
              "Cluster"=rep(substring(x,6,6),times=nrow(theData)),
              "Point"=rep(substring(x,8,8),times=nrow(theData)),
              theData)
    })
    # rbind the data frames in theList into a single data frame  
    theResult <- do.call(rbind,theList)
    

    regards,

    Len