I am trying to run a mixed effect poisson model. I am having a problem with model convergence when I enter a specific variable and I am hoping to get thoughts on why that might be. Here is a segment of my data.
id gender race gene grade y
1 0 1 -1.5 6 4
1 0 1 -2.1 7 2
1 0 1 1.5 8 6
2 1 2 3.6 6 4
2 1 2 2.1 7 3
2 1 2 1.6 8 1
I used the code below and I am getting the error message below.
m2<-glmer(y ~ gender + race + gene + grade +
(1 | id), data=data_long_1, family = "poisson"(link = "log"), control = glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00392577 (tol = 0.002, component 1)
The problem is the "grade" variable as when I remove the variable, I don't get that error message. Everyone has 3 grandes (6,7,8). I, ideally, want to run grade x gene interactions, but I won't be able to do that if grade isn't in the model.
The estimated coefficients are:
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.683e+00 4.653e-02 36.159 < 2e-16 ***
gender1 -3.194e-02 3.584e-02 -0.891 0.37288
race1 1.329e-01 4.249e-02 3.127 0.00177 **
gene 8.298e-03 2.499e-02 0.332 0.73983
grade 2.980e-07 6.552e-03 0.000 0.99996
gene:grade 3.346e-07 6.768e-03 0.000 0.99996
Can someone provide insight into why this variable might be a problem?
I can't replicate your convergence warnings: with the data you sent off-line, on Linux, with a development version of lme4
, I don't get any convergence warnings — such platform-dependence is not terribly unusual ...
However, I think I can explain your results based on the structure of the data you sent. Here is a sample for a typical individual, with the values modified for confidentiality:
id gender race y gene grade
1 xxxx 1 1 8 -1.543210 6
2 xxxx 1 1 8 -1.543210 7
3 xxxx 1 1 8 -1.543210 8
4 xxxx 1 1 8 -1.543210 9
gender
, race
, gene
, and y
, the response variable do not vary within id
(this is important)grade
varies within id
, and it is perfectly balanced — each id
has exactly four observations, for grade
=6,7,8,9This means that the average effect of grade
on y
, or the interaction of anything with grade
, is exactly zero!
Since this data set doesn't really have more than one observations' worth of information about each id
(i.e. the same values are repeated 4 times for each individual, except for grade
), it might be to better to take only the first observation for each individual and fit
glm(y ~ gender + race + gene, data=..., family=poisson)
(I usually omit the (link="log")
because it's the default, but it's fine to include it if it makes the code clearer).
A similar question shows that things get more pathological if you try to fit a model with a residual variance term (e.g. LMM/Gaussian response) to such a data set ...