Search code examples
pythonpandasscikit-learnpatsy

Mapping dummy variables in pandas data frame


I have a large DataFrame with 11 columns. I need to convert categorical variables into binary values, so I used Patsy:

attributes = "admit ~ C(gender) + age + C(ethnicity) + C(state) + gpa + sci_gpa + mcat + C(major) + C(tier) + C(same_ins)"
y, X = dmatrices(attributes, df, return_type="dataframe")

This works well. However, I want to test a new sample using data that was stored in the format of the original data frame E.g:

gender    age    ethnicity    state    gpa    sci_gpa    gre    major    tier    same_ins
male      21     Asian        NV       3.4    3.2        .99    Physics  1       1     

Is there an easy way to convert this into the same format as X??


Solution

  • I figured it out. Patsy stores the meta data for the dmatrix. It can be called via

    build_design_matrices()