Search code examples
rfile-ioforeachmat-file

Problems reading MATLAB .mat files with a foreach loop in R


I have over a thousand matlab files that I want to read into R. I use the R.matlab package to read them and I would like to parallel the operation.

However, once I call the loop (I am generating a single data set from all the .mat files) I get an error:

Error in { : task 1 failed - "not possible to encounter function "readMat""

(I translated the part of the error between "", since my R is not in english)

without the foreach command, everything goes fine, but it takes too long. Here is the code

library(R.matlab)
library(plyr)
library(foreach)
library(doParallel)

a = list.files()
data <- readMat(a[1])

for(j in 2:length(a))  {

  data1 <- readMat(a[j])

  if (is.null(data1)==FALSE) {
      data <- rbind.fill(data,data1)
  }}
  print(j)
}

with the foreach command I get the above error. Here is the code:

    library(R.matlab)
    library(plyr)
    library(foreach)
    library(doParallel)


cl<-makeCluster(8)
registerDoParallel(cl)

    a = list.files()
    data <- readMat(a[1])

    foreach(j = 2:length(a)) %dopar% {

      data1 <- readMat(a[j])

      if (is.null(data1)==FALSE) {
          data <- rbind.fill(data,data1)
      }}
      print(j)
    }

Does it mean foreach and readMat should not be used together?


Solution

  • Just if anyone is wondering, I forgot to export R.matlab to each cluster node. Just needed to add .packages argument inside the foreach call

    library(R.matlab)
    library(plyr)
    library(foreach)
    library(doParallel)
    
    
    cl<-makeCluster(8)
    registerDoParallel(cl)
    
    a    <- list.files()
    data <- readMat(a[1])
    
    foreach(j = 2:length(a), .packages = c("plyr", "doParallel",
                                           "R.matlab")) %dopar% {
    
      data1 <- readMat(a[j])
    
      if (!is.null(data1)) {
          data <- rbind.fill(data,data1)
      }
    
    }