Search code examples
rparallel-processingmixed-modelsglmmtmbglmm

R package glmmTMB - model family nbinom2 - Error in MakeADFunObject


I could use some quick assistance with fitting a negative binomial model in R using the "glmmTMB" package with the family set to nbinom2. I opted for glmmTMB because it allows specifying fixed and random effects in both the main and zero-inflated parts of the formula, and it also supports parallel computing.

nt <- parallel::detectCores()-1
neg_bin <- glmmTMB(eq_main, # eq_main has both fixed and random effects, and an offset term in fixed effects
                    data = x,
                    ziformula = eq_zeros, # eq_zeros has only fixed effects; no random effects
                    family = nbinom2,
                    REML = TRUE, control = glmmTMBControl(parallel = nt))

However, I've hit a roadblock with the following error:

"Error in MakeADFunObject(data, parameters, reportenv, ADreport = ADreport, : Caught exception 'std::bad_alloc' in function 'MakeADFunObject'"

Can someone shed light on what this error means and suggest steps I could take to resolve it? I would like to think that it is not a memory issue because I am utilizing the most state-of-the-art machine I was able to get my hands on (A.K.A. a supercomputer). Thanks in advance for your help!

NOTE: I recognize the importance of sharing a reproducible example, but due to the extensive size of the dataset (comprising several hundred variables), I'm currently refraining from providing one. If the community deems it necessary, I am more than willing to share a reproducible example upon request.


Solution

  • This is a "running-out-memory" error.

    • it would help a lot if you told us more about the dimensions of your problem: how many observations total? How many variables, and in particular how many factor variables with how many levels? In particular, what is
    dim(model.matrix(lme4::nobars(eq_main), data = x))
    

    (and the equivalent for your zero-inflation model formula)?

    • "most state-of-the-art machine I was able to get my hands on" is actually not very descriptive; how much RAM is available? (If you are running this in a high-performance-computing (HPC) facility, how much memory have you requested for the job?)
    • can you try running some examples with small subsets of your data (subsetting predictor variables, observations, or both), and see (1) how big a subset you can successfully run and (2) how the memory requirements and computing time scale with problem size? (The peakRAM package is useful for this — it will report elapsed time, memory usage, and peak memory usage.)
    • I would be surprised if parallelizing is affecting your memory usage (glmmTMB uses OpenMP, which is a shared-memory approach), but it couldn't hurt to try without parallelizing to see if it makes a difference.

    The only suggestion I can make off the top of my head/without further information that might help is to try sparseX = TRUE in your glmmTMB() call: if you have a lot of factor variables with many levels, which will get expanded into many columns containing mostly zeros, this could reduce the memory footprint of your problem.