Let us say I have data for 300 firms(level 1) in 10 countries (level 2). The level 1 variables are PQ and size. Level 2 variable is the per capita GDP.
library(lme4)
set.seed(1)
PQ <- runif(300,7,21)
id <- (1:300)
country <- sample(1:10,300,replace=T)
size <- sample(1:25000,300,replace=T)
GDP <- sample(800:40000,10,replace=T)
Country1 <- 1:10
L1data <- as.data.frame(cbind(id,country,PQ,size))
L2data <- as.data.frame(cbind(Country1,GDP))
MLdata <- merge(L1data,L2data, by.x = "country", by.y = "Country1")
dummymodel <- lmer(PQ ~ size + GDP + (size|country), data = MLdata, REML = F)
When I run this I get warning messages
Warning messages: 1: Some predictor variables are on very different scales: consider rescaling 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 1.77081 (tol = 0.002, component 1) 3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model is nearly unidentifiable: very large eigenvalue - Rescale variables?; Model is nearly unidentifiable: large eigenvalue ratio - Rescale variables?
In fact when I run the model with the original data I get an additional warning message:
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
I guess I need to scale the independent variables to solve this problem. What is the correct way to scale in multilevel regression like this? The question is important as the results of multilevel models are dependent on scaling. Or is there some other problem that I am not able to locate?
tl;dr The models have nearly equivalent goodness of fit; the scaled model is slightly better and more reliable; the fixed effects are nearly equivalent; both models estimate singular random-effects variance-covariance matrices, which makes comparison much harder and means you shouldn't be relying on these models for conclusions about variances in any case ...
The model should have the same meaning after centering and rescaling. As I'll show below, the fixed effects are essentially equivalent. I'm finding it harder to convince myself that the variance-covariance estimates are equivalent, but the model is singular anyway (i.e., there isn't enough information to fit a positive-definite variance-covariance matrix).
Using your example and re-running with scaled parameters:
MLdata <- transform(MLdata,
size_cs=scale(size),
GDP_cs=scale(GDP))
m2 <- lmer(PQ ~ size_cs + GDP_cs + (size_cs|country), data = MLdata,
REML = FALSE)
Comparing the log-likelihoods:
logLik(dummymodel) ## -828.4349
logLik(m2) ## -828.4067
This suggests that the models are pretty close, but the scaled model has done slightly better (although an improvement of 0.03 log-likelihood units is very small).
The fixed effects look different, but I'm going to show that they're equivalent:
fixef(dummymodel)
## (Intercept) size GDP
## 1.345754e+01 3.706393e-05 -5.464550e-06
## fixef(m2)
## (Intercept) size_cs GDP_cs
## 13.86155940 0.27300688 -0.05914308
(If you look at coef(summary(.))
for both models, you'll see that the t-statistics for size
and GDP
are nearly identical.)
From this question
rescale.coefs <- function(beta,mu,sigma) {
beta2 <- beta ## inherit names etc.
## parameters other than intercept:
beta2[-1] <- sigma[1]*beta[-1]/sigma[-1]
## intercept:
beta2[1] <- sigma[1]*beta[1]+mu[1]-sum(beta2[-1]*mu[-1])
return(beta2)
}
cm <- sapply(MLdata[c("size","GDP")],mean)
csd <- sapply(MLdata[c("size","GDP")],sd)
rescaled <- rescale.coefs(fixef(m2),mu=c(0,cm),sigma=c(1,csd))
all.equal(rescaled,fixef(m2))
cbind(fixef(dummymodel),rescaled)
## rescaled
## (Intercept) 1.345754e+01 1.345833e+01
## size 3.706393e-05 3.713062e-05
## GDP -5.464550e-06 -5.435151e-06
Very similar.
With regard to the variance-covariance matrices:
VarCorr(dummymodel)
## Groups Name Std.Dev. Corr
## country (Intercept) 2.3420e-01
## size 1.5874e-05 -1.000
## Residual 3.8267e+00
VarCorr(m2)
## Groups Name Std.Dev. Corr
## country (Intercept) 0.0000e+00
## size_cs 4.7463e-08 NaN
## Residual 3.8283e+00
Neither variance-covariance matrix is positive definite; the first has the variation in intercept across countries perfectly correlated with variation in slope across countries, while the second assigns zero variance to the intercept across countries. In addition, the among-country variation is very small relative the residual variance in either case. If both matrices were positive definite we could work on finding the transformation that would scale from one case to the other (I think it would just be to multiply each element by (s_i s_j), where s_. is the scaling factor applied to a given predictor). When they're not PD, it becomes tricky.