Search code examples
pythonpatsy

patsy formula - adding powers of a factor


I use patsy to build design matrix. I need to include powers of the original factors. For example, with the regression y~x1+x1^2+x2+x2^2+x2^3, I want to be able to write

patsy.dmatrix('y~x1 + x1**2 + x2 + x2**2 + x2**3', data)

where data is a dataframe that contains column y, x1, x2. But it does not seem to work at all. Any solutions?


Solution

  • Patsy has a special interpretation of ** that it inherited from R. I've considered making it automatically do the right thing when applied to numeric factors, but haven't actually implemented it... in the mean time, there's a general method for telling patsy to switch to using the Python interpretation of operators, instead of the Patsy interpretation: you wrap your expression in I(...). So:

    patsy.dmatrix('y~x1 + I(x1**2) + x2 + I(x2**2) + I(x2**3)', data)
    

    (More detailed explanation here)