I have a variable cols
that contain list of column name for my table.
Now I want to run an regression on my table by looping through different columns of cols
variable.
I am trying to use Statsmodel Formula API (Patsy) but am unable to construct a proper formula
The code that I am trying right now is:
model = smf.ols(formula="Annual_Sales ~ Q('cols')", data=df).fit()
But this obviously is not working as cols is not present in my df
table.
Any suggestion how can I do this, preferably by for loop
as I have 150 columns and I can't manually enter all those names in formula.
Thank You
One way I was able to solve this problem was using String Formatting, as generally the formula written inside Statsmodel
is in String
format.
So if we have,
col = ["a", "b", "c", "d"]
We can write,
for i in range(0, len(col) - 1):
for j in range(i + 1, len(col)):
model = smf.ols(formula="Annual_Sales ~ Q('{}') + ('{}')".format(col[i], col[j]), data=df).fit()
This will allow us to loop through the list variable col, while taking two factors at a time to create the model.