Search code examples
pythonpandasformulalazy-evaluationpatsy

Create a custom function in Patsy


import patsy
from patsy import dmatrices, dmatrix, demo_data
dt=pd.DataFrame({'F1':['a','b','c','d','e','a'],'F2':['X','X','Y','Y','Z','Z']})

I know I can do this

dmatrix("1+I(F1=='a')",dt)

but can I create a arbitrary function patsy? I'm trying to mimicing same level flexibility in formula language in R, but it seems not straight forward to achieve in python

def abd(x):
    1 if x in ['a','b','d'] else 0

dmatrix("1+abd(F1)",dt)

Solution

  • IIUC

    def abd(x):
        return x.isin(['a','b','d'])
    dmatrix("1+abd(F1)",dt)
    Out[182]: 
    DesignMatrix with shape (6, 2)
      Intercept  abd(F1)[T.True]
              1                1
              1                1
              1                0
              1                1
              1                0
              1                1
      Terms:
        'Intercept' (column 0)
        'abd(F1)' (column 1)