I am looking at the MCMC Example tutorial from the package mcmc in R. The code below is given there, where MCMC is used to compute the parameters in a logistic regression example.
library(mcmc)
data(logit)
foo <- logit
out <- glm(y ~ x1 + x2 + x3 + x4, data = foo, family = binomial())
x <- foo
x$y <- NULL
x <- as.matrix(x)
x <- cbind(1, x)
dimnames(x) <- NULL
y <- foo$y
lupost <- function(beta, x, y){
eta <- x %*% beta
p <- 1/(1 + exp(-eta))
logl <- sum(log(p[y == 1])) + sum(log(1-p[y == 0]))
return(logl + sum(dnorm(beta, 0, 2, log = TRUE)))
}
set.seed(42)
beta.init <- as.numeric(coefficients(out))
out <- metrop(lupost, beta.init, 1000, x=x, y = y)
The idea is to compute the standard Monte Carlo error. Therefore
> apply(out$batch, 2, mean)
[1] 0.6531950 0.7920342 1.1701075 0.5077331 0.7488265
[6] 0.5145751 0.7560775 1.4973807 0.3913837 0.7244162
is used. My question is, what is the meaning of the final 5 columns in the output of this command? The tutorial states that it is somehow E(X^2)
. But in which line is X
generated? What is X
here?
If I run just the code you posted in your question above, then I only get 5 numbers:
apply(out$batch, 2, mean)
# [1] 0.5990174 0.8027482 1.1539329 0.4547702 0.7172724
It seems like the "tutorial" you are following is the mcmc::demo
vignette. The vignette does all of the steps you posted above, and then it also calculates
out <- metrop(out, nbatch = 100, blen = 100, outfun = function(z) c(z, z^2))
followed by
apply(out$batch, 2, mean)
# [1] 0.6531950 0.7920342 1.1701075 0.5077331 0.7488265
# [6] 0.5145751 0.7560775 1.4973807 0.3913837 0.7244162
Here you can see that outfun
is calculating X and X^2, and that you get estimates of E[X] and E[X^2] when you take the average using apply()
.
The X's here appear to be draws from the posterior distribution of the model parameters (the betas). Notice that the Bayesian posterior means (the first 5 numbers) are quite close to the non-Bayesian point estimates calculated with glm
in the fourth line of your code.