I have a dataframe that looks like this:
And am running a logit model on fluid as dependent variable, and excluding vp
and perip
:
model = smf.logit('''fluid ~ C(examq3_n, Treatment(reference = 2.0)) + C(pmhq3_n) + C(fluidq3_n) + C(mapq3_n, Treatment(reference = 3.0)) +
C(examq6_n, Treatment(reference = 2.0)) + C(pmhq6_n) + C(fluidq6_n) + C(mapq6_n, Treatment(reference = 3.0)) +
+ C(case, Treatment(reference = 2))''',
data = case1_2_vars).fit()
print(model.summary())
I get the following results:
I am wondering if I need to add a constant to the data and if so, how? I've tried adding a column to the dataframe called const
which equals 1
, but when I then add const
to the logit equation I get LinAlgError: Singular Matrix,
and I don't know how to add it using smf.add_constant()
because I have had to specify the categorical variables and their respective reference numbers in the equation, rather than defining x
and y
separately and simply inputting those into the smf.logit()
call.
My questions are: a) do I need to add a constant, and b) how? There are some links online that seem to imply it might not be necessary for a categorical variable-based logit model, but I would rather do it if it's best practice.
I'm also wondering, does statsmodels automatically include a constant? Because Intercept
is listed in the results.
If you use formulas, then the formula handling by patsy adds automatically a constant/intercept.
(when using e.g. smf.logit
or sm.Logit.from_formula
)
If you create a model without formula using numpy arrays or pandas DataFrame, then the exog
is not changed by statsmodels, i.e. users needs to add a constant themselves. The helper function is sm.add_constant
which adds a column of ones to the array or DataFrame.
(when using e.g. sm.Logit(y, x)
)