Search code examples
rnormal-distributionmixed-modelsnlm

R: mix() in mixdist package returning error


I have installed the mixdist package in R to combine distributions. Specifically, I'm using the mix() function. See documentation. Basically, I'm getting

Error in nlm(mixlike, lmixdat = mixdat, lmixpar = fitpar, ldist = dist, : missing value in parameter

I googled the error message, but no useful results popped up.

My first argument to mix() is a data frame called data.df. It is formatted exactly like the built-in data set pike65. I also did data.df <- as.mixdata(data.df).

My second argument has two rows. It is a data frame called datapar, formatted exactly like pikepar. My pi values are 0.5 and 0.5. My mu values are 250 and 463 (based on my data set). My sigma values are 0.5 and 1.

My call to mix() looks like:
fitdata <- mix(data.df, datapar, "norm", constr = mixconstr(consigma="CCV"), emsteps = 3, print.level = 2)

The printing shows that my pi values go from 0.5 to NaN after the first iteration, and that my gradient is becoming 0.

I would appreciate any help in sorting out this error.

Thanks,
n.i.


Solution

  • Using the test data you linked to

    library(mixdist) 
    time <- seq(673,723) 
    counts <-c(3,12,8,12,18,24,39,48,64,88,101,132,198,253,331,
       419,563,781,1134,1423,1842,2505,374,6099,9343,13009, 
       15097,13712,9969,6785,4742,3626,3794,4737,5494,5656,4806,
       3474,2165,1290,799,431,213,137,66,57,41,35,27,27,27) 
    data.df <- data.frame(time=time, counts=counts) 
    

    We can see that

    startparam <- mixparam(c(699,707),1 )
    data.fit <- mix(data.mix, startparam, "norm") 
    

    Gives the same error. This error appears to be closely tied to the data (so the reason this data does not work could be potentially different than why yours does not work but this is the only example you offered up).

    The problem with this data is that the probability between the two groups becomes indistinguishable at some point. Then that happens, the "E" step of the algorithm cannot estimate the pi variable properly. Here

    pnorm(717,707,1)
    # [1] 1
    pnorm(717,699,1)
    # [1] 1
    

    both are exactly 1 and this seems to be causing the error. When mix takes 1 minus this value and compares the ratio to estimate group, it gets NaN values which are propagated to the estimate of proportions. When internally these NaN values are passed to nlm() to do the estimation, you get the error message

    Error in nlm(mixlike, lmixdat = mixdat, lmixpar = fitpar, ldist = dist,  : 
      missing value in parameter
    

    The same error message can be replicated with

    f <- function(x) sum((x-1:length(x))^2)
    nlm(f, c(10,10))
    nlm(f, c(10,NaN)) #error
    

    So it appears the maxdist package will not work in this scenario. You may wish to contact the package maintainer to see if they are aware of the problem. In the meantime you will will need to find another way to estimate the parameters of you mixture model.