I need to make some manipulation to a dataset but my r script (shown below) is running really slow. The data set is a dataframe that has dimension: 58347 x 41350. I tried to first run the R script below on a much smaller dataset (58347 x 5) and it took me an hour to process it. I would imagine it's going to take much longer to process the actual dataset. Do you guys know any way to make it run faster?
Please see my codes below:
library("LoomExperiment")
dataset<-import("WongAdultRetina homo_sapiens 2019-11-08 16.13.loom")
m<-assay(dataset)
colsums<-colSums(m)
result<-data.frame()
for(i in seq_len(nrow(m))){
if(i%%500==0){
print(paste("i =",i))
}
for(j in seq_len(ncol(m))){
if(colsums[j]== 0){
result[i,j]<- 0
}
else {
result[i,j]<-(m[i,j]*2000)/colsums[j]
}
}
}
save(result,file="resultlocal.rda")
Thank you so much.
It's hard to say what to do without understanding exactly what you're trying to achieve here. But I'll try.
First, you can replace data.frame
with data.table
. From my experience they're much faster to work with.
Second, you can create result
data.frame with a specified size. For example, it looks like it will always have a size of nrow(m) by ncol(m)
. So, result = as.data.frame(matrix(nrow = nrow(m), ncol = ncol(m)))
. Of course you can always replace it with data.table
too.
Specifying the size of data.frame
will allocate enough memory to the object. This way, R
won't have to grow (copy the contents of original frame into an object that is one unit bigger and then delete the original) the object to just add another element.