Search code examples
rstanrstan

Sampling from prior without running a separate model


I want to graph the histograms of parameter estimates from a stan model against the priors for those parameters. I have tried doing this by running a model in stan, graphing it with ggplot2, then overlaying an approximation of the prior distribution using R's random generator function (e.g. rnorm(), rbinom()) but I have run into many scaling issues that make the graphs impossible to get looking right.

I was thinking a better way to do it would be simply to sample directly from the prior distribution and then graph those samples against the parameter estimates, but running a whole separate model just to sample from the priors seems very time-consuming. I was wondering if there was a way to do this within, or rather parallel-to, an existing model.

Here is a sample script.

# simulate linear model
a <- 3 # intercept
b <- 2 # slope

# data
x <- rnorm(28, 0, 1)
eps <- rnorm(28, 0, 2)
y <- a + b*x + eps

# put data into list
data_reg <- list(N = 28, x = x, y = y)

# create the model string

ms <- "
    data {
    int<lower=0> N;
    vector[N] x;
    vector[N] y;
    }
    parameters {
    real alpha;
    real beta;
    real<lower=0> sigma;
    }
    model {
    vector[N] mu;
    sigma ~ cauchy(0, 2);
    beta ~ normal(0,10);
    alpha ~ normal(0,100);
    for ( i in 1:N ) {
    mu[i] = alpha + beta * x[i];
    }
    y ~ normal(mu, sigma);
    }
"

# now fit the model in stan
fit1 <- stan(model_code = ms,     # model string
             data = data_reg,        # named list of data
             chains = 1,             # number of Markov chains
             warmup = 1e3,          # number of warmup iterations per chain
             iter = 2e3)         # show progress every 'refresh' iterations

# extract the sample estimates
post <- extract(fit1, pars = c("alpha", "beta", "sigma"))

# now for the density plots. Write a plotting function
densFunct <- function (parName) {
  g <- ggplot(postDF, aes_string(x = parName)) + 
              geom_histogram(aes(y=..density..), fill = "white", colour = "black", bins = 50) +
              geom_density(fill = "skyblue", alpha = 0.3)
  return(g)
}

# plot 
gridExtra::grid.arrange(grobs = lapply(names(postDF), function (i) densFunct(i)), ncol = 1)

enter image description here

Now I understand that I can sample from the prior by simply omitting the likelihood from the model string, like so

ms <- "
  data {
    int<lower=0> N;
    vector[N] x;
    vector[N] y;
  }
  parameters {
    real alpha;
    real beta;
    real<lower=0> sigma;
  }
  model {
    sigma ~ cauchy(0, 2);
    beta ~ normal(0,10);
    alpha ~ normal(0,100);
  }
"

But is there any way to get the samples from the prior within the first model? Maybe via the generated quantities block?


Solution

  • There are two ways you can do this.

    First, if the program is general enough, just pass in zero-size data so that the posterior is the prior. For instance, N = 0 in the regression example you gave will work (along with the right zero-sized x and y).

    Second, you can write a pure Monte Carlo generator (doesn't use MCMC) in the generated quantities block. Something like:

    generated quantities {
      real<lower = 0> sigma_sim = cauchy_rng(0, 2);  // wide tail warning!
      real beta_sim = normal_rng(0, 10);
      real alpha_sim = normal_rng(0, 20);
    }
    

    The second approach is much more efficient as it conveniently draws an independent sample and doesn't have to do any MCMC.