Search code examples
rglmlme4convergence

warning messages in lme4 for survival analysis that did not arise 3 years ago


I am trying to fit a generalized linear mixed-effects model to my data, using the lme4 package.

The data can be described as follows (see example below): Survival data of fish over 28 days. Explanatory variables in the example data set are:

  • Region This is the geographical region from which the larvae originated.
  • treatment The temperatures at which sub-samples of fish from each region were raised.
  • replicate One of three replications of the entire experiment
  • tub Random variable. 15 tubs (used to maintain experimental temperatures in aquaria) in total (3 replicates for each of 5 temperature treatments). Each tub contained 1 aquaria for each Region (4 aquaria in total) and was located randomly in the lab.
  • Day Self explanatory, number of days from the start of the experiment.
  • stage is not being used in the analysis. Can be ignored.

Response variable

  • csns cumulative survival. i.e remaining fish/initial fish at day 0.
  • start weights used to tell the model that the probability of survival is relative to this number of fish at start of experiment.
  • aquarium Second random variable. This is the unique ID for each individual aquaria containing the value of each factor that it belongs to. e.g. N-14-1 means Region N, Treatment 14, replicate 1.

My problem is unusual, in that I have fitted the following model before:

   dat.asr3<-glmer(csns~treatment+Day+Region+
      treatment*Region+Day*Region+Day*treatment*Region+
      (1|tub)+(1|aquarium),weights=start, 
      family=binomial, data=data2)  

However, now that I am attempting to re-run the model, to generate analyses for publication, I am getting the following errors with the same model structure and package. Output is listed below:

> Warning messages:  

1: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 1.59882 (tol = 0.001, component >1)
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
- Rescale variables?

My understanding is the following:

Warning message 1.

non-integer #success in a binomial glm refers to the proportion format of the csns variable. I have consulted several sources, here included, github, r-help, etc, and all suggested this. The research fellow that assisted me in this analysis 3 years ago, is unreachable. Can it have to do with changes in lme4 package over the last 3 years?

Warning message 2.

I understand this is a problem because there are insufficient data points to fit the model to, particularly at
L-30-1, L-30-2 and L-30-3,
where only two observations are made:
Day 0 csns=1.00 and Day 1 csns=0.00
for all three aquaria. Therefore there is no variability or sufficient data to fit the model to.

Nevertheless, this model in lme4 has worked before, but doesn't run without these warnings now.

Warning message 3

This one is entirely unfamiliar to me. Never seen it before.

Sample data:

Region treatment replicate tub Day stage csns start aquarium  
 N        14         1  13   0    1 1.00   107   N-14-1
 N        14         1  13   1    1 1.00   107   N-14-1
 N        14         1  13   2    1 0.99   107   N-14-1
 N        14         1  13   3    1 0.99   107   N-14-1
 N        14         1  13   4    1 0.99   107   N-14-1
 N        14         1  13   5    1 0.99   107   N-14-1

The data in question 1005cs.csv is available here via we transfer: http://we.tl/ObRKH0owZb

Any help with deciphering this problem, would be greatly appreciated. Also any alternative suggestions for suitable packages or methods to analyse this data would be great too.


Solution

  • tl;dr the "non-integer successes" warning is accurate; it's up to you to decide whether fitting a binomial model to these data really makes sense. The other warnings suggest that the fit is a bit unstable, but scaling and centering some of the input variables can make the warnings go away. It's up to you, again, to decide whether the results from these different formulations are different enough for you to worry about ...

    data2 <- read.csv("1005cs.csv")
    library(lme4)
    

    Fit model (slightly more compact model formulation)

    dat.asr3<-glmer(
       csns~Day*Region*treatment+
            (1|tub)+(1|aquarium),
                weights=start, family=binomial, data=data2)
    

    I do get the warnings you report.

    Let's take a look at the data:

    library(ggplot2); theme_set(theme_bw())
    ggplot(data2,aes(Day,csns,colour=factor(treatment)))+
            geom_point(aes(size=start),alpha=0.5)+facet_wrap(~Region)
    

    enter image description here

    Nothing obviously problematic here, although it does clearly show that the data are very close to 1 for some treatment combinations, and that the treatment values are far from zero. Let's try scaling & centering some of the input variables:

    data2sc <- transform(data2,
                         Day=scale(Day),
                         treatment=scale(treatment))
    dat.asr3sc <- update(dat.asr3,data=data2sc)
    

    Now the "very large eigenvalue" warning is gone, but we still have the "non-integer # successes" warning, and a max|grad|=0.082. Let's try another optimizer:

    dat.asr3scbobyqa <- update(dat.asr3sc,
                         control=glmerControl(optimizer="bobyqa"))
    

    Now only the "non-integer #successes" warning remains.

    d1 <- deviance(dat.asr3)
    d2 <- deviance(dat.asr3sc)
    d3 <- deviance(dat.asr3scbobyqa)
    c(d1,d2,d3)
    ## [1] 12597.12 12597.31 12597.56
    

    These deviances don't differ by very much (0.44 on the deviance scale is more than could be accounted for by round-off error, but not much difference in goodness of fit); actually, the first model gives the best (lowest) deviance, suggesting that the warnings are false positives ...

    resp <- with(data2,csns*start)
    plot(table(resp-floor(resp)))
    

    enter image description here

    This makes it clear that there really are non-integer responses, so the warning is correct.