Search code examples
pythonbayesianpymc3

building PyMC3 model incorporating different measurements


I am trying to incorporate different types and replicates of measurements into one model in PyMC3.

Consider the following model: P(t)=P0*exp(-kBt) where P(t), P0, and B are concentrations. k is a rate. We measure P(t) at different times and B once, all through counting of particles. k is the parameter of interest we are trying to infer.

My question has two parts: (1) How to incorporate measurements on P(t) and B into one model? (2) How to use a variable number of replicate experiments to inform on the value of k?

I think I can answer part (1), but am unsure about whether it is right or done in the right flavour. I failed to generalise the code to include a variable number of replicates.

For one experiment (one replicate):

ts=np.asarray([time0,time1,...])
counts=np.asarray([countforB,countforPattime0,countforPattime1,...])
basic_model = pm.Model()
with basic_model:
    k=pm.Uniform('k',0,20)
    B=pm.Uniform('B',0,1000)
    P=pm.Uniform('P',0,1000)
    exprate=pm.Deterministic('exprate',k*B)
    modelmu=pm.math.concatenate(B*(np.asarray([1.0]),P*pm.math.exp(-exprate*ts)))
    Y_obs=pm.Poisson('Y_obs',mu=modelmu,observed=counts))

I tried to include different replicates along the lines of the above, but to no avail:

...
    k=pm.Uniform('k',0,20) # same within replicates
    B=pm.Uniform('B',0,1000,shape=numrepl) # can vary between expts.
    P=pm.Uniform('P',0,1000,shape=numrepl) # can vary between expts.
    exprate=???
    modelmu=???

Solution

  • Multiple Observables

    PyMC3 supports multiple observables, that is, you can add multiple RandomVariable objects to the graph with the observed argument set.

    Single Trial

    In your first case, this would lend some clarity to the model:

    counts=[countforPattime0, countforPattime1, ...]
    
    with pm.Model() as single_trial:
        # priors
        k = pm.Uniform('k', 0, 20)
        B = pm.Uniform('B', 0, 1000)
        P = pm.Uniform('P', 0, 1000)
    
        # transformed RVs
        rate = pm.Deterministic('exprate', k*B)
        mu = P*pm.math.exp(-rate*ts)
    
        # observations
        B_obs = pm.Poisson('B_obs', mu=B, observed=countforB)
        Y_obs = pm.Poisson('Y_obs', mu=mu, observed=counts)
    

    Multiple Trials

    With this additional flexibility, hopefully it makes the transition to multiple trials more obvious. It should go something like:

    B_cts = np.array(...) # shape (N, 1)
    Y_cts = np.array(...) # shape (N, M)
    ts = np.array(...)    # shape (1, M)
    
    with pm.Model() as multi_trial:
        # priors
        k = pm.Uniform('k', 0, 20)
        B = pm.Uniform('B', 0, 1000, shape=B_cts.shape)
        P = pm.Uniform('P', 0, 1000, shape=B_cts.shape)
    
        # transformed RVs
        rate = pm.Deterministic('exprate', k*B)
        mu = P*pm.math.exp(-rate*ts)
    
        # observations
        B_obs = pm.Poisson('B_obs', mu=B, observed=B_cts)
        Y_obs = pm.Poisson('Y_obs', mu=mu, observed=Y_cts)
    

    There might be some extra syntax stuff to get the matrices multiplying correctly, but this at least includes the correct shapes.


    Priors

    Once you get that setup working, it would be in your interest to reconsider the priors. I suspect you have more information about the typical values for those than is currently included, especially since this seems like a chemical or physical model.

    For instance, right now the model says,

    We believe the true value of B remains fixed for the duration of a trial, but across trials is a completely arbitrary value between 0 and 1000, and measuring it repeatedly within a trial would be Poisson distributed.

    Typically, one should avoid truncations unless they are excluding meaningless values. Hence, a lower bound of 0 is fine, but the upper bounds are arbitrary. I'd recommend having a look at the Stan Wiki on choosing priors.