Search code examples
rparallel-processingsapply

How do I parallelize my Sapply function using a big.matrix? ( object of type 'S4' is not subsettable error)


first post/question so apologies if I do anything wrong, just let me know and ill fix it. I am trying to use parsapply to implement a function that gets the mean of a weighted vector (just using mean to make it work i wish to be able to do other stuff, but the mean for now) but I keep receiving this error :

4 nodes produced errors; first error: object of type 'S4' is not subsettable

I am using a big.matrix called PUBG_stats and trying to implement it on the 8th column and a partitioning vector called partition in the code below, how do I convert my data from S4 to a class that works or is there another way to do this? Ive use R a good bit but I am new to parallel.

library(parallel)
ncores<-detectCores()
cl <- makeCluster(ncores-1) 
clusterExport(cl, c("PUBG_stats","partition"))
system.time(parLapply(cl, 1:4, loopi, y=x1, partid=partid1))
parSapply(cl,1:5,function(x)(sum(PUBG_stats[,8][partition==1]*rand_vec(length(PUBG_stats[,8][partition==1]),N))/N)) 

ran_vec is just some function creating weights, and the 1:5 is there as i want to repeat it 1:r times whatever r needed.

A reproducible example that has the same issue would be the below:

library(bigmemory)
library(parallel)
a<- as.big.matrix(rnorm(100000))
ncores <- detectCores()  
cl <- makeCluster(ncores) 
clusterExport(cl, c("a","sum","rnorm"))

parSapply(cl,1:5,function(x)(sum(a[,]*rnorm(1))))

stopcluster(cl)

error:

    object of type 'S4' is not subsettable

Solution

  • The proper way is to load all packages needed also on the workers, which you can do by using clusterEvalQ(), e.g.

    library(bigmemory)
    library(parallel)
    a <- as.big.matrix(rnorm(100000))
    
    ## Setup workers
    ncores <- 2
    cl <- makeCluster(ncores)
    clusterExport(cl, c("a","sum","rnorm"))
    ignore <- clusterEvalQ(cl, { library(bigmemory) })
    
    res <- parSapply(cl, 1:5, function(x) { sum(a[,]*rnorm(1)) })
    
    stopcluster(cl)
    

    However, if we try to run the above code, we'll get:

    > res <- parSapply(cl, 1:5, function(x) { sum(a[,]*rnorm(1)) })
    Error in checkForRemoteErrors(val) : 
      5 nodes produced errors; first error: external pointer is not valid
    

    This is because objects of class big.matrix cannot be exported to other R processes (here workers). That is a limitation in how those objects work. There is no solution to this as far as I know - if there was, it would be the author of bigmatrix could solve it.