Search code examples
rbayesianstanrstanlog-likelihood

Evaluating log-likelihood of unseen data in rstan


I understand I can calculate the log likelihood of each sample during sampling, e.g.

...

model {

  for (i in 1:N) {
    (y[i] - 1) ~ bernoulli(p[i, 2]);
  }

}

generated quantities {

  vector[N] log_lik;
  for (i in 1:N){
    log_lik[i] = bernoulli_lpmf((y[i] - 1) | p[i, 2]);
  }

}

After fitting, I can then extract log likelihood using the loo package:

log_lik_m <- extract_log_lik(stan_fit)

But I want to evaluate log likelihood of unseen data. This is possible in brms:

ll <- log_lik(fit_star, newdata = new_df)

But I would like to do this with rstan, since I can't easily define my model in brms (I am assuming).

For reference, I am trying to use Estimated LFO-CV to evaluate and compare my time-series model. (e.g. https://github.com/paul-buerkner/LFO-CV-paper/blob/master/sim_functions.R#L186)

(https://mc-stan.org/loo/articles/loo2-lfo.html)


Solution

  • Thanks to the link from @dipetkov, I solved this myself. I didn't use the exact methods in the link, but came up with an alternative. You can call stan functions from R to get it to compute log likelihood for your model, even with unseen data (and its very fast!).

    First, I put everything in my transformed parameters block into a function in stan's functions block. Then, I created a second function that wraps the first function, and evaluates the log likelihood for given observations and provided parameter estimates (I then removed my generated_quantities block). rstan has a function expose_stan_functions which adds all functions in the stan functions block to the R environment.

    You can then call the log likelihood function you made to evaluate your model with any observations (previously seen or unseen), along with a set of parameter estimates.