Search code examples
rmultithreadingcpu-coresdirichlet

Can `dmn {DirichletMultinomial}` be run on multiple cpu cores in R?


I analyse microbiome data using

library(phyloseq)
library(microbiome)
library(DirichletMultinomial)

and several other libraries. Fitting Dirichlet-Multinomial models to count data dmn {DirichletMultinomial} takes quite a long time. Can the computation be run on multiple cpu cores in R. I tried:

dat <- abundances(pseq)
count <- as.matrix(t(dat))
fit <- lapply(1:25, dmn, count = count, verbose=TRUE)

replacing with:

library(parallel)
numCores <- detectCores()
...
fit <- mclapply(1:25, dmn, count = count, verbose=TRUE, mc.cores = numCores)

but it returns errorWarning message: In mclapply(1:25, dmn, count = count, verbose = TRUE, mc.cores = numCores) : all scheduled cores encountered errors in user code

I am using

R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)
> detectCores()
[1] 4

Can anyone help?

Best regards, Marcin


Solution

  • Yes, as illustrated in the vignette http://bioconductor.org/packages/release/bioc/vignettes/DirichletMultinomial/inst/doc/DirichletMultinomial.pdf section 2 and in your code it is possible to run on multiple cores.

    Probably what is happening is that there are errors for some of the values of X; what is the value of fit? Also, one might try

    library(BiocParallel)
    fit <- bplapply(1:25, dmm, count, BPPARAM = MulticoreParam(numCores))
    

    fit will be an object that can be queried (see the BiocParallel vignette available from https://bioconductor.org/packages/BiocParallel) for more error information.