PIRLS step halving with cloglog link in lme4

I'm getting the PIRLS step-halving error using lme4::lmer() with family = binomial(link = cloglog). Seems possible to me that I've generated data for which the model below is too complex, which may be the full answer, but I don't understand the error note well enough to be confident. I saw another answer that referenced behavior of less-canonical link function, and wondered if there could be issues in the machinery leading to this error as well/instead.

data, using dput:

sub <- structure(list(sp = c("1", "1", "1", "1", "1", "2", "2", "2", 
                             "2", "2", "3", "3", "3", "3", "3", "4", "4", "4", "4", "4", "5", 
                             "5", "5", "5", "5"), rplct = c("X1", "X2", "X3", "X4", "X5", 
                                                            "X1", "X2", "X3", "X4", "X5", "X1", "X2", "X3", "X4", "X5", "X1", 
                                                            "X2", "X3", "X4", "X5", "X1", "X2", "X3", "X4", "X5"), abundance = c(131L, 
                                                                                                                                 124L, 128L, 114L, 108L, 62L, 57L, 62L, 65L, 54L, 45L, 45L, 35L, 
                                                                                                                                 38L, 33L, 29L, 31L, 30L, 32L, 30L, 21L, 20L, 25L, 23L, 28L), 
                      pres = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                               1, 1, 1, 1, 1, 1, 1, 1, 1), tot_ab = c(288L, 277L, 280L, 
                                                                      272L, 253L, 288L, 277L, 280L, 272L, 253L, 288L, 277L, 280L, 
                                                                      272L, 253L, 288L, 277L, 280L, 272L, 253L, 288L, 277L, 280L, 
                                                                      272L, 253L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                                                            -25L))

No errors fitting a model with random effects but no offset term:

no_off <- lme4::glmer(pres ~ (1|sp)
                       , family = binomial(link = cloglog)
                      , data = sub)

Nor with offsets but no random effect:

no_re <- glm(pres ~ offset(log(tot_ab)) 
                             , family = binomial(link = cloglog)
                             , data = sub)

but PIRLS halving error with both offset and REs

smaller <- lme4::glmer(pres ~ offset(log(tot_ab)) + (1|sp)
               , family = binomial(link = cloglog)
               , data = sub)

What's driving this error in this example?

I was able to generate other interesting and more parseable errors, such as Error: cannot generate feasible simplex when simulating inappropriate data, not sure what my problem is here though.

Thanks!

Solution

This problem is fundamentally caused by the thin upper tail of the cloglog inverse-link function, which interacts with the default starting values for the parameters; fixed-effect coefficients are set to zero by default, random-effects (scaled) variances are set to 1 by default. This usually makes sense, but here's why it doesn't work when we have a large offset and a cloglog link:

cc <- make.link("cloglog")$linkinv
par(las=1, bty = "l")
curve(1-cc(x), from = -5, to = 10, log = "y", n= 501,
          ylab = "1-Prob(x)", xlab = "Linear predictor")

Thus any value of the linear predictor greater than about 3.6 will give a constant value of 1-.Machine$double.eps.

If all of the offsets are significantly above 3.6 (i.e. total abundance greater than approx exp(3.6) = 36.6), then all of the predicted values will be on the upper 'shelf'; furthermore, the gradient of the inverse link function is zero in this region too, although R sets it to .Machine$double.eps in the region where it would likely underflow.

It's not immediately obvious why this should break the PIRLS algorithm; it does depend on the derivative (see equation 13 here, but this equation isn't obviously broken by the scenario described here. (The inverse variance used in the weight matrix would be large, of order 1/.Machine$double.eps, but again, this isn't necessarily a problem.

library(lme4)
simfun <- function(seed = NULL) {
    if (!is.null(seed)) set.seed(seed)
    dd <- data.frame(sp = factor(rep(1:30, each = 5)),
                     tot_ab = rnbinom(150, mu = 250, size = 50))
    dd$pres <- suppressMessages(simulate(~ (1|sp) + offset(log(tot_ab))
                      , family = binomial(link = "cloglog")
                      , newdata = dd
                      , newparam = list(beta=-8, theta = 1))[[1]])
    dd
}

Solution: make the starting value for the intercept (the only fixed effect parameter in this example) much smaller, so that the starting values of the predictions are in a more reasonable range:

glmer(pres ~ (1|sp) + offset(log(tot_ab)),
      family = binomial(link = "cloglog"),
      data = dd,
      start = list(fixef = -log(min(dd$tot_ab))))

The fit works (although we do get some convergence warnings).

It might work to define one's own cloglog link function that set the floor and ceiling differently, but this might just run into other numerical problems...

An alternative solution: rather than use log of total abundance as the offset, use log(tot_ab)/min(tot_ab); that is, rather than model the probability of occurrence per individual organism sampled, model the occurrence of probability in a sample the size of the minimum sample. (You should also be able to pick a round number of the same order of magnitude as the smallest sample size.) The code below demonstrates that in 1000 runs on simulated data, the fit always fails with offset(log(tot_ab)) but always succeeds (although there may be singular fits, convergence warnings, etc.) with the minimum-scaled offset.

fitfun <- function(dd, fix_offset = FALSE) {
    if (!fix_offset) {
        form <- pres ~ (1|sp) + offset(log(tot_ab))
    } else {
        form <- pres ~ (1|sp) + offset(log(tot_ab)/min(tot_ab))
    }
    res <- try(
        suppressMessages(
            glmer(
                form
              , family = binomial(link = "cloglog")
              , data = dd)), silent = TRUE
    )
    if (!inherits(res, "try-error")) return("OK")
    as.character(res)
}

Fit with original offset:

set.seed(101)
res <- replicate(1000, fitfun(simfun()))
table(res)

Error : (maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate
                                                                                     8 
                                            Error : cannot generate feasible simplex
                                                                                   983 
                          Error : pwrssUpdate did not converge in (maxit) iterations

                                                                                     9

Now fit with min-scaled offset:

res <- replicate(1000, fitfun(simfun(), fix_offset = TRUE))
table(res)

res
  OK 
1000