I am not very good at R, so sorry for the clumsy question.
I have a big amount of different datasets. I wanted to use a loop to load all these and do some cleansing. I also need to run them all through a cluster analysis later.
What I want to do for 1 file:
data1 <- read.table("Datasæt Pris 1.csv",
header = TRUE,
sep = "|")
rownames(data1) = data1$LPERMNO
data1$LPERMNO = NULL
data1<-data1[, colSums(is.na(data1)) != nrow(data1)]
data1[is.na(data1)] <- 0
So the first part I found a way to do, which works!:
for (i in 42:44){
assign(paste0("Data", i), read.table(paste("Datasæt Pris ",i,".csv",sep = ""),
header = TRUE,
sep = "|"))
}
However when I introduced the next step, it does not work. I have tried a million different variations of get(), assign(), cat(), paste(), eval(). As an example I tried:
for (i in 42:44){
assign(paste0("Data", i), read.table(paste("Datasæt Pris ",i,".csv",sep = ""),
header = TRUE,
sep = "|"))
rownames(cat("Data", i, sep="")) = cat("Data", i, "$LPERMNO",sep="")
cat("Data", i, "$LPERMNO",sep="") = NULL
}
The error I am getting here is : Data42$LPERMNOFejl i rownames(cat("Data", i, sep = "")) = cat("Data", i, "$LPERMNO", : target of assignment expands to non-language object
I have read every thread I could find. But as I am in a bit of a hurry, my thesis should be handed in on May 17th, I am almost at the point where I will do every step manually.
But I thought my last try, would be to post a question here, and hope my question is understandable!
Best regards
Emilie
This is typically done using one of the many apply functions. these collect the output of each iterations and return it for you to gather them up. lapply would be great for this. it creates a list of whatever each iteration produces.
list.with.data.sets <- lapply( 42:44, function(i) {
data1 <- read.table(paste("Datasæt Pris ",i,".csv",sep = ""),
header = TRUE,
sep = "|")
rownames(data1) = data1$LPERMNO
data1$LPERMNO = NULL
data1 <- data1[, colSums(is.na(data1)) != nrow(data1) ]
data1[is.na(data1)] <- 0
return( data1 )
})
combined.data <- do.call( rbind, list.of.data.sets )
## if you have dplyr:
library(dplyr)
combined.data <- bind_rows( list.of.data.sets )
If you need to run cluster analysis on them individually, not in one big lump, then process them again using lapply:
your.clusters <- lapply( list.with.data.sets, function(d) {
cluster <- cluster.analysis.one.way.or.another(d)
return( cluster )
})
etc.. etc... The whole idea is to operate with a list of data sets rather than dynamically generated variable names.