Search code examples
outputstanrstan

How does rstan store posterior samples for separate chains?


I would like to understand how the output of extract in rstan orders the posterior samples. I understand that I can view the posterior samples from each chain by using as.array,

stanfit <- sampling(
  model,
  data = stan.data)
​
fitarray <- as.array(stanfit)

For example, fitarray[, 2, 1] will give me the samples for the second chain of the first parameter. One way to store the posterior samples in the output of extract would be just to concatenate them. When I do,

fit <- extract(stanfit)
mean(fitarray[,2,1]) == mean(fit$ss[1001:2000]) 

for several chains and parameters I always get TRUE (ss is the first parameter). This makes it seem like the posterior samples are being concatenated in fit. However, when I do,

fitarray[,2,1] == fit$ss[1001:2000]

I get FALSE (confirmed that there's not just precision difference). It appears that fitarray and fit are storing the iterations differently. How do I view the iterations (in order) of each posterior sample chain separately?


Solution

  • As can be seen from rstan:::as.array.stanfit, the as.array method is essentially defined as

    extract(x, permuted = FALSE, inc_warmup = FALSE)
    

    Your default use of extract keeps the warmup and permutes the post-warmup draws randomly, which is why the indices do not line up with the as.array output.