Search code examples
pythonregressiondata-analysisstatsmodelsinteraction

Python: Probit Regression with interaction term of two categorial variables


I want to run a probit regression (see model below) in which two dummy variables (gender x treatment) interact. However, when including treatment*gender in the model, I get the error message "categorical data cannot be >1-dimensional". Compared to using other analysis software I get approximate results through creating an additional dummy which takes the value 1 if both dummies equal 1 - but I don't think this is quite correct with respect to the underlying statistical model. The marginal effect of the interaction would have to be the cross-derivative of treatment*gender and not simply the derivative by co-occurrence. Unfortunately, I can't find anything about this in the Stats models documentation. How do I correctly include the interaction term?

Thank you very much!

probit_model = probit("decision ~ gender + treatment1 + treatment2 + 
              control1 + control2 + contorl3 + treatment1*Gender + treatment2*Gender", data_probit).fit() 
              #here the error raises

probit_model = probit("decision ~ gender + treatment1 + treatment2 + 
               treatment1_Gender + treatment2_gender + control1 + control2 + contorl3", data_probit).fit()

Solution

  • Try the following:

    probit_model = probit("decision ~ control1 + control2 + contorl3 + 
                          treatment1*Gender + treatment2*Gender",
                          data_probit).fit() 
    

    Looking at patsy's documentation here, it looks like patsy automatically adds the main effects when using *. That means by specifying treatment1*gender, it is interpreted as treatment1+gender+treatment1:gender automatically. The first two terms here are repeated from your formula.

    If this does not work, please provide a minimal reproducible example including data. See here.