I want to run a probit regression (see model below) in which two dummy variables (gender x treatment) interact. However, when including treatment*gender in the model, I get the error message "categorical data cannot be >1-dimensional". Compared to using other analysis software I get approximate results through creating an additional dummy which takes the value 1 if both dummies equal 1 - but I don't think this is quite correct with respect to the underlying statistical model. The marginal effect of the interaction would have to be the cross-derivative of treatment*gender and not simply the derivative by co-occurrence. Unfortunately, I can't find anything about this in the Stats models documentation. How do I correctly include the interaction term?
Thank you very much!
probit_model = probit("decision ~ gender + treatment1 + treatment2 +
control1 + control2 + contorl3 + treatment1*Gender + treatment2*Gender", data_probit).fit()
#here the error raises
probit_model = probit("decision ~ gender + treatment1 + treatment2 +
treatment1_Gender + treatment2_gender + control1 + control2 + contorl3", data_probit).fit()
Try the following:
probit_model = probit("decision ~ control1 + control2 + contorl3 +
treatment1*Gender + treatment2*Gender",
data_probit).fit()
Looking at patsy
's documentation here, it looks like patsy automatically adds the main effects when using *
. That means by specifying treatment1*gender
, it is interpreted as treatment1+gender+treatment1:gender
automatically. The first two terms here are repeated from your formula.
If this does not work, please provide a minimal reproducible example including data. See here.