Search code examples
rcdfweibullmle

Errors running Maximum Likelihood Estimation on a three parameter Weibull cdf


I am working with the cumulative emergence of flies over time (taken at irregular intervals) over many summers (though first I am just trying to make one year work). The cumulative emergence follows a sigmoid pattern and I want to create a maximum likelihood estimation of a 3-parameter Weibull cumulative distribution function. The three-parameter models I've been trying to use in the fitdistrplus package keep giving me an error. I think this must have something to do with how my data is structured, but I cannot figure it out. Obviously I want it to read each point as an x (degree days) and a y (emergence) value, but it seems to be unable to read two columns. The main error I'm getting says "Non-numeric argument to mathematical function" or (with slightly different code) "data must be a numeric vector of length greater than 1". Below is my code including added columns in the df_dd_em dataframe for cumulative emergence and percent emergence in case that is useful.

    degree_days <-   c(998.08,1039.66,1111.29,1165.89,1236.53,1293.71,
                      1347.66,1387.76,1445.47,1493.44,1553.23,1601.97,
                      1670.28,1737.29,1791.94,1849.20,1920.91,1967.25,
                      2036.64,2091.85,2152.89,2199.13,2199.13,2263.09,
                      2297.94,2352.39,2384.03,2442.44,2541.28,2663.90,
                      2707.36,2773.82,2816.39,2863.94)
    emergence <-  c(0,0,0,1,1,0,2,3,17,10,0,0,0,2,0,3,0,0,1,5,0,0,0,0,
                   0,0,0,0,1,0,0,0,0,0)
    cum_em <- cumsum(emergence)
    df_dd_em <- data.frame (degree_days, emergence, cum_em)
    df_dd_em$percent <- ave(df_dd_em$emergence, FUN = function(df_dd_em) 100*(df_dd_em)/46)
    df_dd_em$cum_per <- ave(df_dd_em$cum_em, FUN = function(df_dd_em) 100*(df_dd_em)/46)
    x <- pweibull(df_dd_em[c(1,3)],shape=5)
    dframe2.mle <- fitdist(x, "weibull",method='mle')

Solution

  • Here's my best guess at what you're after:

    Set up data:

    dd <- data.frame(degree_days=c(998.08,1039.66,1111.29,1165.89,1236.53,1293.71,
                          1347.66,1387.76,1445.47,1493.44,1553.23,1601.97,
                          1670.28,1737.29,1791.94,1849.20,1920.91,1967.25,
                          2036.64,2091.85,2152.89,2199.13,2199.13,2263.09,
                          2297.94,2352.39,2384.03,2442.44,2541.28,2663.90,
                          2707.36,2773.82,2816.39,2863.94),
                     emergence=c(0,0,0,1,1,0,2,3,17,10,0,0,0,2,0,3,0,0,1,5,0,0,0,0,
                     0,0,0,0,1,0,0,0,0,0))
    dd <- transform(dd,cum_em=cumsum(emergence))
    

    We're actually going to fit to an "interval-censored" distribution (i.e. probability of emergence between successive degree day observations: this version assumes that the first observation refers to observations before the first degree-day observation, you could change it to refer to observations after the last observation).

    library(bbmle)
    ## y*log(p) allowing for 0/0 occurrences:
    y_log_p <- function(y,p) ifelse(y==0 & p==0,0,y*log(p))
    NLLfun <- function(scale,shape,x=dd$degree_days,y=dd$emergence) {
        prob <- pmax(diff(pweibull(c(-Inf,x),      ## or (c(x,Inf))
                 shape=shape,scale=scale)),1e-6)
        ## multinomial probability
        -sum(y_log_p(y,prob))
    }    
    library(bbmle)
    

    I should probably have used something more systematic like the method of moments (i.e. matching the mean and variance of a Weibull distribution with the mean and variance of the data), but I just hacked around a bit to find plausible starting values:

    ## preliminary look (method of moments would be better)
    scvec <- 10^(seq(0,4,length=101))
    plot(scvec,sapply(scvec,NLLfun,shape=1))
    

    It's important to use parscale to let R know that the parameters are on very different scales:

    startvals <- list(scale=1000,shape=1)
    m1 <- mle2(NLLfun,start=startvals,
         control=list(parscale=unlist(startvals)))
    

    Now try with a three-parameter Weibull (as originally requested) -- requires only a slight modification of what we already have:

    library(FAdist)
    NLLfun2 <- function(scale,shape,thres,
                        x=dd$degree_days,y=dd$emergence) {
        prob <- pmax(diff(pweibull3(c(-Inf,x),shape=shape,scale=scale,thres)),
                     1e-6)
        ## multinomial probability
        -sum(y_log_p(y,prob))
    }    
    startvals2 <- list(scale=1000,shape=1,thres=100)
    m2 <- mle2(NLLfun2,start=startvals2,
         control=list(parscale=unlist(startvals2)))
    

    Looks like the three-parameter fit is much better:

    library(emdbook)
    AICtab(m1,m2)
    ##    dAIC df
    ## m2  0.0 3 
    ## m1 21.7 2 
    

    And here's the graphical summary:

    with(dd,plot(cum_em~degree_days,cex=3))
    with(as.list(coef(m1)),curve(sum(dd$emergence)*
                                 pweibull(x,shape=shape,scale=scale),col=2,
                                 add=TRUE))
    with(as.list(coef(m2)),curve(sum(dd$emergence)*
                                 pweibull3(x,shape=shape,
                                           scale=scale,thres=thres),col=4,
                                 add=TRUE))
    

    enter image description here

    (could also do this more elegantly with ggplot2 ...)

    • These don't seem like spectacularly good fits, but they're sane. (You could in principle do a chi-squared goodness-of-fit test based on the expected number of emergences per interval, and accounting for the fact that you've fitted a three-parameter model, although the values might be a bit low ...)
    • Confidence intervals on the fit are a bit of a nuisance; your choices are (1) bootstrapping; (2) parametric bootstrapping (resample parameters assuming a multivariate normal distribution of the data); (3) delta method.
    • Using bbmle::mle2 makes it easy to do things like get profile confidence intervals:

     confint(m1)
     ##             2.5 %      97.5 %
     ## scale 1576.685652 1777.437283
     ## shape    4.223867    6.318481