I am creating a series of MCMC diagnostic plots in r using ggplot. I realize there is already a package available in gg for MCMC plotting, but much of this is for my own education as well as practical use. One thing I can't seem to figure out is how to generate the gelman.plot in a ggplot framework.
The gelman.diag function only returns a simple data point and I would like to recreate the complete running chart as shown in gelman.plot.
Is anyone familiar with the algorithmic structure of the gelman potential scale reduction factor and/or a means to port its output to ggplot?
Thank you!
You haven't provided a reproducible example, so I've used the example here. We need the object called combinedchains
from that example. In order to avoid cluttering the answer, I've put the code for that at the end of this post.
Now we can run gelman.plot
on combined.chains
. This is the plot we want to duplicate:
library(coda)
gelman.plot(combined.chains)
To create a ggplot version, we need to get the data for the plot. I haven't done MCMC before, so I'm going to let gelman.plot
generate the data for me. For your actual use case, you can probably just generate the appropriate data directly.
Let's look at what gelman.plot
is doing: We can see the code for that function by typing the bare function name in the console. A portion of the function code is below. The ...
show where I've removed sections of the original code for brevity. Note the call to gelman.preplot
, with the output of that function stored in y
. Note also that y
is returned invisibly at the end. y
is a list containing the data we need to create a gelman.plot
in ggplot.
gelman.plot = function (x, bin.width = 10, max.bins = 50, confidence = 0.95,
transform = FALSE, autoburnin = TRUE, auto.layout = TRUE,
ask, col = 1:2, lty = 1:2, xlab = "last iteration in chain",
ylab = "shrink factor", type = "l", ...)
{
...
y <- gelman.preplot(x, bin.width = bin.width, max.bins = max.bins,
confidence = confidence, transform = transform, autoburnin = autoburnin)
...
return(invisible(y))
}
So, let's get the data that gelman.plot
returns invisibly and store it in an object:
gp.dat = gelman.plot(combinedchains)
Now for the ggplot version. First, gp.dat
is a list and we need to convert the various parts of that list into a single data frame that ggplot can use.
library(ggplot2)
library(dplyr)
library(reshape2)
df = data.frame(bind_rows(as.data.frame(gp.dat[["shrink"]][,,1]),
as.data.frame(gp.dat[["shrink"]][,,2])),
q=rep(dimnames(gp.dat[["shrink"]])[[3]], each=nrow(gp.dat[["shrink"]][,,1])),
last.iter=rep(gp.dat[["last.iter"]], length(gp.dat)))
For the plot, we'll melt df
into long format, so that we can have each chain in a separate facet.
ggplot(melt(df, c("q","last.iter"), value.name="shrink_factor"),
aes(last.iter, shrink_factor, colour=q, linetype=q)) +
geom_hline(yintercept=1, colour="grey30", lwd=0.2) +
geom_line() +
facet_wrap(~variable, labeller= labeller(.cols=function(x) gsub("V", "Chain ", x))) +
labs(x="Last Iteration in Chain", y="Shrink Factor",
colour="Quantile", linetype="Quantile") +
scale_linetype_manual(values=c(2,1))
MCMC example code to create the combinedchains
object (code copied from here):
trueA = 5
trueB = 0
trueSd = 10
sampleSize = 31
x = (-(sampleSize-1)/2):((sampleSize-1)/2)
y = trueA * x + trueB + rnorm(n=sampleSize,mean=0,sd=trueSd)
likelihood = function(param){
a = param[1]
b = param[2]
sd = param[3]
pred = a*x + b
singlelikelihoods = dnorm(y, mean = pred, sd = sd, log = T)
sumll = sum(singlelikelihoods)
return(sumll)
}
prior = function(param){
a = param[1]
b = param[2]
sd = param[3]
aprior = dunif(a, min=0, max=10, log = T)
bprior = dnorm(b, sd = 5, log = T)
sdprior = dunif(sd, min=0, max=30, log = T)
return(aprior+bprior+sdprior)
}
proposalfunction = function(param){
return(rnorm(3,mean = param, sd= c(0.1,0.5,0.3)))
}
run_metropolis_MCMC = function(startvalue, iterations) {
chain = array(dim = c(iterations+1,3))
chain[1,] = startvalue
for (i in 1:iterations) {
proposal = proposalfunction(chain[i,])
probab = exp(likelihood(proposal) + prior(proposal) - likelihood(chain[i,]) - prior(chain[i,]))
if (runif(1) < probab){
chain[i+1,] = proposal
}else{
chain[i+1,] = chain[i,]
}
}
return(mcmc(chain))
}
startvalue = c(4,2,8)
chain = run_metropolis_MCMC(startvalue, 10000)
chain2 = run_metropolis_MCMC(startvalue, 10000)
combinedchains = mcmc.list(chain, chain2)
UPDATE: gelman.preplot
is an internal coda
function that's not directly visible to users. To get the function code, in the console type getAnywhere(gelman.preplot)
. Then you can see what the function is doing and, if you wish, construct your own function to return the appropriate diagnostic data in a form more suitable for ggplot.