brms - understanding chains, iter and warmup

I am trying to use the brms package's brm function to fit bayesian mixed effect models. The documentation isnt very clear on what exactly is achieved by increasing the number of chains, number of iterations and warmup. It would be helpful if someone can help explain these.

brmmod <- brm(
  data = modeling_input,
  formula = brm_formula,
  prior = brm_prior,
  cores = 1, chains = 4, iter = 1000, warmup = 500
)

I have realized that increasing the number of cores to equal the number of chains gives me the least run time.

I want to understand: a) how will increasing the value of iter and warmup parameters help me. b) If I increase the number of Markov chains, how does it impact the model? c) If I spin a machine equal to the number of chains parameter, would that give me the best performance in terms of runtime? My current model with cores = 1 takes 3 days to run. I however changed the cores to 4 and didn't change any other parameters. This helped me bring down the runtime to 2 days.

I am new to this so appreciate some help. Happy to read any good material or blog post. Have tried to find more details but the documentation isnt very helpful.

It can be found here (https://cran.r-project.org/web/packages/brms/brms.pdf) and the page number to reference is Pg 27.

Solution

To answer your question, it's helpful to remember that we are trying to sample the posterior distribution. To that end, we need "enough" samples to make reasonable statements about the posterior. How much is enough? This is going to depend on our model (more complex models usually need more samples) and our needs.

We gather samples through Markov chain Monte Carlo (MCMC) sampling: start a chain with some initial parameters, run in for a while to get into our region of interest (warm up), and then sample our distribution. So each chain contributes iter - warmup samples or chains x (iter - warmup) total samples.

Now to your questions:

a) how will increasing the value of iter and warmup parameters help me.

Increasing iter will increase the number of samples from the posterior. Once the estimates have converged additional samples don't really help our estimates You'll have to test these for your model but the defaults (2000 and floor(iter/2)) are usually reasonable.

b) If I increase the number of Markov chains, how does it impact the model?

Adding more chains increases the number of useful samples we get at the expense of more computational time. We also gain a little bit of confidence that our conclusions aren't dependent on the starting conditions (since we get chains different starting conditions).

c) If I spin a machine equal to the number of chains parameter, would that give me the best performance in terms of runtime? My current model with cores = 1 takes 3 days to run. I however changed the cores to 4 and didn't change any other parameters. This helped me bring down the runtime to 2 days.

Chains are embarrassingly parallel so if we set cores = chains (and we have that many cores) we can reduce our computation time accordingly. Increasing chains would let us use more cores but we would also need to decrease iter to have the same number of samples. Remember that each chain has warmup samples that are discarded so the overhead is proportional to chains (i.e. adding more chains isn't guaranteed to reduce the wall clock time). You might try choosing parameters with a subset of your data and then fitting a final model on all your data.