Search code examples
pythonpymc3arviz

Summarise the posterior of a single parameter from an array with arviz


I am estimating a model using the pyMC3 library in python. In my "real" model, there are four parameter arrays, two of which have over 170,000 parameters in them. Summarising this array of parameters is too computationally intensive on my computer. I have been trying to figure out if the summary function in arviz will allow me to only summarise one (or a small number) of parameters in the array. Below is a reprex where the same problem is present, though the model is a lot simpler. In the linear regression model below, the parameter array b has three parameters in it b[0], b[1], b[2]. I would like to know how to get the summary for just b[0] and b[1] or alternatively for just a single parameter, e.g., b[0].

import pandas as pd
import pymc3 as pm
import arviz as az

d = pd.read_csv("https://quantoid.net/files/mtcars.csv")

mpg = d['mpg'].values
hp = d['hp'].values
weight = d['wt'].values

with pm.Model() as model: 
    b = pm.Normal("b", mu=0, sigma=10, shape=3)
    sig = pm.HalfCauchy("sig", beta=2)
    mu = pm.Deterministic('mu', b[0] + b[1]*hp + b[2]*weight)
    like = pm.Normal('like', mu=mu, sigma=sig, observed=mpg)
    fit = pm.fit(10000, method='advi')
    samp = fit.sample(1500)    

with model: 
    smry = az.summary(samp, var_names = ["b"])

It looked like the coords argument to the summary() function would do it, but after googling around and finding a few examples, like the one here with plot_posterior() instead of summary(), I was unable to get something to work. In particular, I tried the following in the hopes that it would return the summary for b[0] and b[1].

with model: 
    smry = az.summary(samp, var_names = ["b"], coords={"b_dim_0": range(1)})

or this to return the summary of b[0]:

with model: 
    smry = az.summary(samp, var_names = ["b"], coords={"b_dim_0": [0]})

I suspect I am missing something simple (I'm an R user who dabbles occasionally with Python). Any help is greatly appreciated.

(BTW, I am using Python 3.8.0, pyMC3 3.9.3, arviz 0.10.0)


Solution

  • To use coords for this, you need to update to the development (which will still show 0.11.2 but has the code from github or any >0.11.2 release) version of ArviZ. Until 0.11.2, the coords argument in summary was not used to subset the data (like it did in all plotting functions) but instead it was only taken into account if the input was not already InferenceData in which case it was passed to the converter.

    With older versions, you need to use xarray to subset the data before passing it to summary. Therefore you need to explicitly convert the trace to inferencedata beforehand. In the example above it would look like:

    with model:
        ...
        samp = fit.sample(1500) 
        idata = az.from_pymc3(samp)
    
    az.summary(idata.posterior[["b"]].sel({"b_dim_0": [0]}))
    

    Moreover, you may also want to indicate summary to compute only a subset of the stats/diagnostics as shown in the docstring examples.