I want to graph the histograms of parameter estimates from a stan model against the priors for those parameters. I have tried doing this by running a model in stan, graphing it with ggplot2, then overlaying an approximation of the prior distribution using R's random generator function (e.g. rnorm()
, rbinom()
) but I have run into many scaling issues that make the graphs impossible to get looking right.
I was thinking a better way to do it would be simply to sample directly from the prior distribution and then graph those samples against the parameter estimates, but running a whole separate model just to sample from the priors seems very time-consuming. I was wondering if there was a way to do this within, or rather parallel-to, an existing model.
Here is a sample script.
# simulate linear model
a <- 3 # intercept
b <- 2 # slope
# data
x <- rnorm(28, 0, 1)
eps <- rnorm(28, 0, 2)
y <- a + b*x + eps
# put data into list
data_reg <- list(N = 28, x = x, y = y)
# create the model string
ms <- "
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
vector[N] mu;
sigma ~ cauchy(0, 2);
beta ~ normal(0,10);
alpha ~ normal(0,100);
for ( i in 1:N ) {
mu[i] = alpha + beta * x[i];
}
y ~ normal(mu, sigma);
}
"
# now fit the model in stan
fit1 <- stan(model_code = ms, # model string
data = data_reg, # named list of data
chains = 1, # number of Markov chains
warmup = 1e3, # number of warmup iterations per chain
iter = 2e3) # show progress every 'refresh' iterations
# extract the sample estimates
post <- extract(fit1, pars = c("alpha", "beta", "sigma"))
# now for the density plots. Write a plotting function
densFunct <- function (parName) {
g <- ggplot(postDF, aes_string(x = parName)) +
geom_histogram(aes(y=..density..), fill = "white", colour = "black", bins = 50) +
geom_density(fill = "skyblue", alpha = 0.3)
return(g)
}
# plot
gridExtra::grid.arrange(grobs = lapply(names(postDF), function (i) densFunct(i)), ncol = 1)
Now I understand that I can sample from the prior by simply omitting the likelihood from the model string, like so
ms <- "
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
sigma ~ cauchy(0, 2);
beta ~ normal(0,10);
alpha ~ normal(0,100);
}
"
But is there any way to get the samples from the prior within the first model? Maybe via the generated quantities block?
There are two ways you can do this.
First, if the program is general enough, just pass in zero-size data so that the posterior is the prior. For instance, N = 0
in the regression example you gave will work (along with the right zero-sized x and y).
Second, you can write a pure Monte Carlo generator (doesn't use MCMC) in the generated quantities block. Something like:
generated quantities {
real<lower = 0> sigma_sim = cauchy_rng(0, 2); // wide tail warning!
real beta_sim = normal_rng(0, 10);
real alpha_sim = normal_rng(0, 20);
}
The second approach is much more efficient as it conveniently draws an independent sample and doesn't have to do any MCMC.