I'm trying to simulate a binary outcome where I have N subjects (with subject-specific probabilities) measured in two distinct periods (say before and after). I want to increase the subject-specific probabilities by a certain odds ratio (OR) value between the two periods.
After the simulation, I used glm
and lme4::glmer
to check if my predefined odds ratio was correctly estimated. I was expecting that only OR estimated by glm
will be biased. However, the OR estimated by lme4::glmer
was also biased as my predefined OR values increases. How can I correct this bias?
Thank you very much,
Below is my simulation
rm(list=ls(all=TRUE))
library(lme4)
library(ggplot2)
N = 2000 #Number of subjects
X = 1:20 #Odds ratio values tested
set.seed(20)
P = runif(N,-4,4) #Subject-specific probability (in logit scale)
#Vectors that will be used to create a data frame
ind = rep(paste0("Sub",1:N),2) #Vector of individuals
x1 = c(rep(0,N),rep(1,N)) #Categorical Predictor Variable x1
OR.glm = NULL;OR.glmer = NULL
#Loop over X
for (OR in X){
value = rbinom(N,1,plogis(P)) #Simulating values for x1=0
value.simu = rbinom(N,1,plogis(P+log(OR))) #Simulating values for x1=1
df = data.frame(ind=ind,y=c(value,value.simu),x1=x1) #Creating data frame
#Using glm
GLM = glm(y~factor(x1),data=df,family="binomial")
OR.glm = c(OR.glm,exp(GLM$coef[2]))
#Using glmer for each subject
GLMER = glmer(y~factor(x1)+(1|ind),data=df,family="binomial")
OR.glmer = c(OR.glmer,exp(summary(GLMER)$coef[2,1]))
}
DF = data.frame(method = rep(c("glm","glmer"),each=length(X)),
data = c(OR.glm,OR.glmer),x = rep(X,2))
ggplot(DF,aes(x = x,y = data,group=method, colour=method))+ theme_bw()+
geom_point() + stat_smooth(method = 'loess') +
geom_abline(slope=1, intercept=0) + ylim(0, max(X)) + xlim(0, max(X)) +
xlab("Expected OR") + ylab("Observed OR")
As far as I can see, you do not simulate normal random effects, which is the assumption behind the mixed effects logistic regression model fitted by glmer()
.
The code below simulates data with a normal random effect, and fits the model with glmer()
of lme4 and with mixed_model()
of GLMMadaptive, which by default uses the adaptive Gaussian quadrature in the estimation (on purpose the code works with design matrices for the fixed and random effects to make it easier to extend it if you want):
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
# we constuct a data frame with the design:
DF <- data.frame(id = rep(seq_len(n), each = K),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex, data = DF)
Z <- model.matrix(~ 1, data = DF)
betas <- c(-2.13, 1) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
# we simulate random effects
b <- rnorm(n, sd = sqrt(D11))
# linear predictor
eta_y <- drop(X %*% betas + rowSums(Z * b[DF$id]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
###############################################################################
library("lme4")
fm <- glmer(y ~ sex + (1 | id), data = DF, family = binomial())
summary(fm)
library("GLMMadaptive")
gm <- mixed_model(y ~ sex, random = ~ 1 | id, data = DF, family = binomial())
summary(gm)