Search code examples
python-3.xpymc3bambimultilevel-analysishierarchical-bayesian

Interpretation of variables in multi-level regression with random effects


I have a dataset that looks like the one below (first 5 rows shown). CPA is an observed result from an experiment (treatment) on different advertising flights. Flights are hierarchically grouped in campaigns.

  campaign_uid  flight_uid treatment         CPA
0   0C2o4hHDSN  0FBU5oULvg   control  -50.757370
1   0C2o4hHDSN  0FhOqhtsl9   control   10.963426
2   0C2o4hHDSN  0FwPGelRRX   exposed  -72.868952
3   0C5F8ZNKxc  0F0bYuxlmR   control   13.356081
4   0C5F8ZNKxc  0F2ESwZY22   control  141.030900
5   0C5F8ZNKxc  0F5rfAOVuO   exposed   11.200450

I fit a model like the following one:

model.fit('CPA ~ treatment',  random=['1|campaign_uid'])

To my knowledge, this model simply says:

  • We have a slope for treatment
  • We have a global intercept
  • We also have an intercept per campaign

so one would just get one posterior for each such variable.

However, looking at the results below, I also get posteriors for the following variable: 1|campaign_uid_offset. What does it represent?

enter image description here

Code for fitting the model and the plot:

model   = Model(df)
results = model.fit('{} ~ treatment'.format(metric),  
                    random=['1|campaign_uid'], 
                    samples=1000)
# Plotting the result
pm.traceplot(model.backend.trace)

Solution

    • 1|campaign_uid

    These are the random intercepts for campaigns that you mentioned in your list of parameters.

    • 1|campaign_uid_sd

    This is the standard deviation of the aforementioned random campaign intercepts.

    • CPA_sd

    This is the residual standard deviation. That is, your model can be written (in part) as CPA_ij ~ Normal(b0 + b1*treatment_ij + u_j, sigma^2), and CPA_sd represents the parameter sigma.

    • 1|campaign_uid_offset

    This is an alternative parameterization of the random intercepts. bambi uses this transformation internally in order to improve the MCMC sampling efficiency. Normally this transformed parameter is hidden from the user by default; that is, if you make the traceplot using results.plot() rather than pm.traceplot(model.backend.trace) then these terms are hidden unless you specify transformed=True (it's False by default). It's also hidden by default from the results.summary() output. For more information about this transformation, see this nice blog post by Thomas Wiecki.