I am trying to run a bestsubset multiple regression using sm.OLS and itertools.combinations. I have added the constant, but because itertools.combinations loops through all column combinations, sometimes it excludes the constant term.
In order to get around this problem, I am trying to use itertools.combinations to always include that constant column in each other combination.
results only include some combinations including constant. How can I make it so every combination has the constant column in it?
example of what I am looking for:
[('const', 'B', 'C'), ('const', 'B', 'D'), ('const', 'B', 'E'), ('const', 'B', 'F'), ('const', 'A', 'B'),
Here is an example of what I currently have (image posted with results):
cols = ['A', 'B', 'C', 'D', 'E', 'F']
const = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
ran = np.random.rand(12, 6)
df = pd.DataFrame(data = ran, columns=cols)
df['const'] = const
results=[]
print(df)
for combo in itertools.combinations(df.columns, 3):
results.append(combo)
print(results)
IIUC, you can do:
for combo in itertools.combinations(df.columns[:-1], 2): # -1 because we want "const" column exclude
results.append(["const", *combo])
print(results)
Prints:
[
["const", "A", "B"],
["const", "A", "C"],
["const", "A", "D"],
["const", "A", "E"],
["const", "A", "F"],
["const", "B", "C"],
["const", "B", "D"],
["const", "B", "E"],
["const", "B", "F"],
["const", "C", "D"],
["const", "C", "E"],
["const", "C", "F"],
["const", "D", "E"],
["const", "D", "F"],
["const", "E", "F"],
]