Search code examples
probabilitypymc

Continous Parent and Discrete Child Conditional with Observed Data in PyMC


I am new to PyMC and trying to set up the simple conditional probability model: P(has_diabetes|bmi, race). Race can take on 5 discrete values encoded as 0-4 and BMI can take on a non-zero positive real number. So far I have the parent variables setup like this:

p_race = [0.009149232914923292,
          0.15656903765690378,
          0.019637377963737795,
          0.013947001394700141,
          0.800697350069735]
race = pymc.Categorical('race', p_race)

bmi_alpha = pymc.Exponential('bmi_alpha', 1)
bmi_beta = pymc.Exponential('bmi_beta', 1)
bmi = pymc.Gamma('bmi', bmi_alpha, bmi_beta, value=bmis, observed=True)

I have observed data that looks like:

| bmi | race | has_diabetes |
| 21.7 | 1 | 0 |
| 45.3 | 4 | 1 |
| 18.9 | 2 | 0 |
| 26.6 | 0 | 0 |
| 35.1 | 4 | 0 |

I am attempting to model has_diabetes as:

has_diabetes = pymc.Bernoulli('has_diabetes', p_diabetes, value=data, observed=True)

My problem is that I am not sure how to construct the p_diabetes function since it is dependent on the values of race and and the continuous value of bmi.


Solution

  • You need to construct a deterministic function that generates p_diabetes as a function of your predictors. The safest way to do this is via a logit-linear transformation. For example:

    intercept = pymc.Normal('intercept', 0, 0.01, value=0)
    beta_race = pymc.Normal('beta_race', 0, 0.01, value=np.zeros(4))
    beta_bmi = pymc.Normal('beta_bmi', 0, 0.01, value=0)
    
    @pymc.deterministic
    def p_diabetes(b0=intercept, b1=beta_race, b2=beta_bmi):
    
        # Prepend a zero for baseline
        b1 = np.append(0, b1)
    
        # Logit-linear model
        return pymc.invlogit(b0 + b1[race] + b2*bmi)
    

    I would make the baseline race be the largest group (it is assumed to be index 0 in this example).

    Actually, its not clear what the first part of the model above is for, specifically, why you are building models for the predictors, but perhaps I am missing something.