Search code examples
pythonstatsmodels

Is there a statsmodel formula equivalent of the R glm library for y ~ .?


I have a dataframe containing the following columns:

y as the dependent variable
A, B, C, D, E, F as the independent variables.

I want to make a regression using the statsmodels module and I don't want to express the formula argument as follows:

formula = 'y ~ A + B + C + D + E + F'

R glm library does have a simplification by expressing formula = y ~ .

I was wondering if statsmodel shortcut as there is one for the glm library in R.

P.S.: the actual dataframe that I'm working has 27 variables


Solution

  • There is no shortcut like "." in patsy formula handling which is used by statsmodels.

    However, python string manipulation is simple.

    An example that I'm currently using, DATA is my dataframe, docvis is the outcome variable, and I have a constant column that is not needed in the formula.

    formula = "docvis ~ " + " + ".join([i for i in DATA if i not in ["docvis", "const"]])
    formula
    'docvis ~ offer + ssiratio + age + educyr + physician + nonphysician + medicaid + private + female + phylim + actlim + income + totchr + insured + age2 + linc + bh + ldocvis + ldocvisa + docbin + aget + aget2 + incomet'
    

    More explicit would be to use column names directly DATA.columns.

    In modern Python we don't need to build the list in the list comprehension, and we can use

    formula = "docvis ~ " + " + ".join(i for i in DATA.columns if i not in ["docvis", "const"])