Continous Parent and Discrete Child Conditional with Observed Data in PyMC

I am new to PyMC and trying to set up the simple conditional probability model: P(has_diabetes|bmi, race). Race can take on 5 discrete values encoded as 0-4 and BMI can take on a non-zero positive real number. So far I have the parent variables setup like this:

p_race = [0.009149232914923292,
          0.15656903765690378,
          0.019637377963737795,
          0.013947001394700141,
          0.800697350069735]
race = pymc.Categorical('race', p_race)

bmi_alpha = pymc.Exponential('bmi_alpha', 1)
bmi_beta = pymc.Exponential('bmi_beta', 1)
bmi = pymc.Gamma('bmi', bmi_alpha, bmi_beta, value=bmis, observed=True)

I have observed data that looks like:

| bmi | race | has_diabetes |
| 21.7 | 1 | 0 |
| 45.3 | 4 | 1 |
| 18.9 | 2 | 0 |
| 26.6 | 0 | 0 |
| 35.1 | 4 | 0 |

I am attempting to model has_diabetes as:

has_diabetes = pymc.Bernoulli('has_diabetes', p_diabetes, value=data, observed=True)

My problem is that I am not sure how to construct the p_diabetes function since it is dependent on the values of race and and the continuous value of bmi.

Solution

You need to construct a deterministic function that generates p_diabetes as a function of your predictors. The safest way to do this is via a logit-linear transformation. For example:

intercept = pymc.Normal('intercept', 0, 0.01, value=0)
beta_race = pymc.Normal('beta_race', 0, 0.01, value=np.zeros(4))
beta_bmi = pymc.Normal('beta_bmi', 0, 0.01, value=0)

@pymc.deterministic
def p_diabetes(b0=intercept, b1=beta_race, b2=beta_bmi):

    # Prepend a zero for baseline
    b1 = np.append(0, b1)

    # Logit-linear model
    return pymc.invlogit(b0 + b1[race] + b2*bmi)

I would make the baseline race be the largest group (it is assumed to be index 0 in this example).

Actually, its not clear what the first part of the model above is for, specifically, why you are building models for the predictors, but perhaps I am missing something.