Search code examples
rparallel-processingmclapply

Passing mclapply() a parameter from for (i in range)


I'm trying to do this:

nmf.sub <- function(n){
sub.data.matrix <- data.matrix[, (index[n, ])] ## the index is a permutation of the original matrix at a 0.8 resampling proportion (doesn't really matter)
temp.result <- nmf(sub.data.matrix, rank = 2, seed = 12345) ## want to change 2 to i
return(temp.result)
}

class.list <- list()
for (i in nmf.rank){ ## nmf.rank is 2:4
results.list <- mclapply(mc.cores = 16, 1:resamp.iterations, function(n) nmf.sub(n)) ## resamp.iterations is 10, nmf.sub is defined above
}

But instead of having rank = 2 in the nmf for temp.result, I want to have rank = i

Any idea how I could pass it that parameter? Just passing it through mclapply as function(n, i) doesn't work.


Solution

  • You seemingly have two loops: one for i in nmf.rank and one for n in 1:resamp.iterations. Therefore, you need to pass both i and n to nmf.sub e.g. like in:

    nmf.sub <- function(n, i){
        ## the index is a permutation of the original matrix at a 0.8
        ## resampling proportion (doesn't really matter)
        sub.data.matrix <- data.matrix[, (index[n, ])] 
        ## want to change 2 to i
        temp.result <- nmf(sub.data.matrix, rank = i, seed = 12345)
        return(temp.result)
    }
    
    
    resamp.iterations <- 10
    nmf.rank <- 2:4
    
    res <- lapply(nmf.rank, function(i){
        results.list <- mclapply(mc.cores = 16, 1:resamp.iterations,
                                 function(n) nmf.sub(n,i))
    })
    ## then you can flatten/reshape res
    

    Regarding your comment (below) about efficiency: the bulk of the numerical calculations is performed within the nmf() function, therefore the loop is properly set up, in the sense that each process/core gets a numerically intensive job. However, to speed up computation you might consider using the previously computed result, instead of the seed 12345 (unless using the latter seed is mandatory for some reason related to your problem). In the following example I get a 30-40% reduction in execution time:

    library(NMF)
    RNGkind("L'Ecuyer-CMRG") ## always use this when using mclapply()
    nr <- 19
    nc <- 2e2
    set.seed(123)
    data.matrix <- matrix(rexp(nc*nr),nr,nc)
    
    resamp.iterations <- 10
    nmf.rank <- 2:4
    
    index <- t(sapply(1:resamp.iterations, function(n) sample.int(nc,nc*0.8)))
    
    
    nmf.sub <- function(n, i){
        sub.data.matrix <- data.matrix[ ,index[n, ]] 
        temp.result <- nmf(sub.data.matrix, rank = i, seed = 12345)
        return(temp.result)
    }
    
    ## version 1
    system.time({
        res <- lapply(nmf.rank, function(i){
            results.list <- mclapply(mc.cores = 16, 1:resamp.iterations,
                                     function(n) nmf.sub(n,i))
        })
    })
    
    ## version 2: swap internal and external loops
    system.time({
        res <- 
            mclapply(mc.cores=16, 1:resamp.iterations, function(n){
                res2 <- nmf(data.matrix[ ,index[n, ]], rank=2, seed = 12345)
                res3 <- nmf(data.matrix[ ,index[n, ]], rank=3, seed = 12345)
                res4 <- nmf(data.matrix[ ,index[n, ]], rank=4, seed = 12345)
                list(res2,res3,res4)
            })
    })
    
    ## version 3: use previous calculation as starting point
    ##   ==> 30-40% reduction in computing time
    system.time({
        res <- 
            mclapply(mc.cores=16, 1:resamp.iterations, function(n){
                res2 <- nmf(data.matrix[ ,index[n, ]], rank=2, seed = 12345)
                res3 <- nmf(data.matrix[ ,index[n, ]], rank=3, seed = res2)
                res4 <- nmf(data.matrix[ ,index[n, ]], rank=4, seed = res3)
                list(res2,res3,res4)
            })
    })