I have over a thousand matlab files that I want to read into R. I use the R.matlab package to read them and I would like to parallel the operation.
However, once I call the loop (I am generating a single data set from all the .mat files) I get an error:
Error in { : task 1 failed - "not possible to encounter function "readMat""
(I translated the part of the error between "", since my R is not in english)
without the foreach command, everything goes fine, but it takes too long. Here is the code
library(R.matlab)
library(plyr)
library(foreach)
library(doParallel)
a = list.files()
data <- readMat(a[1])
for(j in 2:length(a)) {
data1 <- readMat(a[j])
if (is.null(data1)==FALSE) {
data <- rbind.fill(data,data1)
}}
print(j)
}
with the foreach command I get the above error. Here is the code:
library(R.matlab)
library(plyr)
library(foreach)
library(doParallel)
cl<-makeCluster(8)
registerDoParallel(cl)
a = list.files()
data <- readMat(a[1])
foreach(j = 2:length(a)) %dopar% {
data1 <- readMat(a[j])
if (is.null(data1)==FALSE) {
data <- rbind.fill(data,data1)
}}
print(j)
}
Does it mean foreach and readMat should not be used together?
Just if anyone is wondering, I forgot to export R.matlab
to each cluster node. Just needed to add .packages
argument inside the foreach
call
library(R.matlab)
library(plyr)
library(foreach)
library(doParallel)
cl<-makeCluster(8)
registerDoParallel(cl)
a <- list.files()
data <- readMat(a[1])
foreach(j = 2:length(a), .packages = c("plyr", "doParallel",
"R.matlab")) %dopar% {
data1 <- readMat(a[j])
if (!is.null(data1)) {
data <- rbind.fill(data,data1)
}
}