I am new to R and dont know exactly how to do for loops. Here is my problem: I have about 160 csv files in a folder, each with a specific name. In each file, there is a pattern:"HL.X.Y.Z.", where X="Region", Y="cluster", and Z="point". What i need to do is read all these csv files, extract strings from the names, create a column with the strings for each csv file, and bind all these csv files in a single data frame. Here is some code of what i am trying to do:
setwd("C:/Users/worddirect")
files.names<-list.files(getwd(),pattern="*.csv")
files.names
head(files.names)
>[1] "HL.1.1.1.2F31CA.150722.csv" "HL.1.1.2.2F316A.150722.csv"
[3] "HL.1.1.3.2F3274.150722.csv" "HL.1.1.4.2F3438.csv"
[5] "HL.1.10.1.3062CD.150722.csv" "HL.1.10.2.2F343D.150722.csv"
Doing like this to read all files works just fine:
files.names
for (i in 1:length(files.names)) {
assign(files.names[i], read.csv(files.names[i],skip=18))
}
Adding an extra column for an individual csv files like this works fine:
test<-cbind("Region"=rep(substring(files.names[1],4,4),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
"Cluster"=rep(substring(files.names[1],6,6),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
"Point"=rep(substring(files.names[1],8,8),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
HL.1.1.1.2F31CA.150722.csv)
head(test)
Region Cluster Point Date.Time Unit Value
1 1 1 1 6/2/14 11:00:01 PM C 24.111
2 1 1 1 6/3/14 1:30:01 AM C 21.610
3 1 1 1 6/3/14 4:00:01 AM C 20.609
However, a for loop of the above doesn`t work.
files.names
for (i in 1:length(files.names)) {
assign(files.names[i], read.csv(files.names[i],skip=18))
cbind("Region"=rep(substring(files.names[i],4,4),times=nrow(i)),
"Cluster"=rep(substring(files.names[i],6,6),times=nrow(i)),
"Point"=rep(substring(files.names[i],8,8),times=nrow(i)),
i)
}
>Error in rep(substring(files.names[i], 4, 4), times = nrow(i)) :
invalid 'times' argument
The final step would be to bind all the csv files in a single data frame.
I appreciate any suggestion. If there is any simpler way to do what i did i appreciate too!
There are many ways to solve a problem in R. A more R-like way to solve this problem is with an apply()
function. The apply()
family of functions acts like an implied for loop, applying one or more operations to each item in passed to it via a function argument.
Another important feature of R is the anonymous function. Combining lapply()
with an anonymous function we can solve your multi file read problem.
setwd("C:/Users/worddirect")
files.names<-list.files(getwd(),pattern="*.csv")
# read csv files and return them as items in a list()
theList <- lapply(files.names,function(x){
theData <- read.csv(x,skip=18)
# bind the region, cluster, and point data and return
cbind(
"Region"=rep(substring(x,4,4),times=nrow(theData)),
"Cluster"=rep(substring(x,6,6),times=nrow(theData)),
"Point"=rep(substring(x,8,8),times=nrow(theData)),
theData)
})
# rbind the data frames in theList into a single data frame
theResult <- do.call(rbind,theList)
regards,
Len