Search code examples
rparallel-processingsnow

Does clusterMap in Snow support dynamic processing?


It seems clusterMap in Snow doesn't support dynamic processing. I'd like to do parallel computing with two pairs of parameters stored in a data frame. But the elapsed time of every job vary very much. If the jobs are run un-dynamically, it will be time consuming.

e.g.

library(snow)
cl2 <- makeCluster(3, type = "SOCK") 
df_t <- data.frame (type=c(rep('a',3),rep('b',3)), value=c(rep('1',3),rep('2',3)))
clusterExport(cl2,"df_t")
clusterMap(cl2, function(x,y){paste(x,y)},
           df_t$type,df_t$value)

Solution

  • It is true that clusterMap doesn't support dynamic processing, but there is a comment in the code suggesting that it might be implemented in the future.

    In the meantime, I would create a list from the data in order to call clusterApplyLB with a slightly different worker function:

    ldf <- lapply(seq_len(nrow(df_t)), function(i) df_t[i,])
    clusterApplyLB(cl2, ldf, function(df) {paste(df$type, df$value)})
    

    This was common before clusterMap was added to the snow package.

    Note that your use of clusterMap doesn't actually require you to export df_t since your worker function doesn't refer to it. But if you're willing to export df_t to the workers, you could also use:

    clusterApplyLB(cl2, 1:nrow(df_t), function(i){paste(df_t$type[i],df_t$value[i])})
    

    In this case, df_t must be exported to the cluster workers since the worker function references it. However, it is generally less efficient since each worker only needs a fraction of the entire data frame.