I am doing glm
regression with a 0-1 distribution dataset. It's going well with ggplot2::geom_smooth
; here is my code:
library(ggplot2)
set.seed(123)
df <- transform(data.frame(Conc=runif(200, min=200, max=1000)),
AE=rbinom(200, 1, prob=plogis((Conc - 600)/100)))
ggplot(df, aes(x = Conc, y = AE)) +
geom_jitter(height = 0.05, alpha = 0.5) +
geom_smooth(method = "glm", formula = y ~ log(x),
method.args = list(family = "binomial"),
color = "grey10")
Now, I want to plot the 95% confidence interval with bootstrap. I tried with boot
package and ggplot2::mean_cl_boot
, but all failed.
I didn't kept all the code that didn't work, but this is the latest code I tried. To be honest, I copied and tried these codes from other answers, but I do not fully understand these codes.
lm_coeffs = function(x, y) {
coeffs = coefficients(lm(y~log(x)))
tibble(C = coeffs[1], m=coeffs[2])
}
nboot = 1000
mtboot = lapply (seq_len(nboot), function(i)
df %>%
slice_sample(prop=1, replace=TRUE) %>%
summarise(tibble(lm_coeffs(Conc, AE))))
mtboot = do.call(rbind, mtboot)
ggplot(df, aes(Conc, AE)) +
geom_abline(aes(intercept=C, slope=m), data = mtboot,
size=0.3, alpha=0.3, color='forestgreen') +
geom_point()
This is your model,
> fit <- glm(AE ~ Conc, family='binomial', data=df)
> ndf <- data.frame(Conc=seq(min(df$Conc), max(df$Conc), length.out=1e3))
> pred <- predict(fit, newdata=ndf, type='link')
that you want to bootstrap,
> set.seed(42)
> bf <- replicate(
+ 999L, {
+ bdf <- df[sample.int(nrow(df), replace=TRUE), ]
+ glm(AE ~ Conc, family='binomial', data=bdf)
+ },
+ simplify=FALSE
+ )
and from the bootstrapped pred
ictions,
> bpred <- sapply(bf, predict, newdata=ndf, type='link')
you want the 95% CI (showing percentile bootstrap method).
> ci <- \(x, sd) x + as.matrix(sd*(-qt(.025, Inf))) %*% cbind(-1, 1)
> bpredci <- ci(matrixStats::rowMeans2(bpred), matrixStats::rowSds(bpred))
To plot it, you want to apply the fit$family$linkinv()
function, as @GavinSimpson explained in this answer a while ago, like this:
> plot(df)
> lines(ndf$Conc, fit$family$linkinv(pred), col=4)
> for (j in 1:2) lines(ndf$Conc, fit$family$linkinv(bpredci[, j]), col=4, lty=2)
Gavin's answer provides the ggplot2 way.
PS: If I use sample data in OP, it looks similar to first plot there.
Data:
> set.seed(42)
> df <- transform(
+ data.frame(Conc=runif(200, min=200, max=1000)),
+ AE=rbinom(200, 1, prob=plogis((Conc - 600)/100))
+ )