Search code examples
rgamma-distribution

Error in gammamixEM... Try different number of components? in R


I am trying to fit a 2 component gamma mixture model to my data (residual values obtained after running Generalized Linear Model), using following command (part of the code):

  expr_mix_gamma <- gammamixEM(expr_glm_residuals, lambda = c(0.75,0.25), k = 2, epsilon = 1e-08, maxit = 1000, maxrestarts=20, verb = TRUE)

The code runs for multiple gene files (in loop). it runs fine for some files whereas for others it throws following error:

Error in gammamixEM(expr_glm_residuals, lambda = c(0.75, 0.25), k = 2,  : Try different number of components?     

I am unable to figure out what is going on. could anyone throw some light on the same? Thanks


Solution

  • Few expert opinions:

    1) Fitting normal mixtures can be difficult, and sometime the optimization algorithm (EM) will get stuck with very slow convergence. Presumably there are options in the package to either increase the max number of steps before giving up or make the convergence criteria less sensitive. The former will increase the run time and the latter will reduce the optimality (possibly leaving you farther from the true optimum). So you should look into changing these as you think appropriate.

    2) From Martin Maechler, ETH Zurich: If it is a case of mixtures of one-dimensional gaussians, it is also strongly recommended looking at CRAN package 'nor1mix' (the "1" is for "one-dimensional). For a while now that small package is providing an alternative to the EM, namely direct MLE, simply using optim() where the likelihood uses a somewhat smart parametrization. Of course, as the EM, this also depends on the starting value, but my (limited) experience has been that nor1mix::norMixMLE() works considerably faster and more reliable than the EM (which I also provide as nor1mix::norMixEM() .

    Apropos 'starting value': The help page shows how to use kmeans() for "somewhat" reliable starts; alternatively, I'd recommend using cluster::pam() to get a start there.

    For me trying first approach did better for some genes, but still kept failing for others. So, finally, I used tryCatch to get my script up and running, loosing information for some genes.