Search code examples
pythonlinear-regressionpython-itertools

utilizing itertools.combinations on various columns of a dataframe, but having it always include a specific column


I am trying to run a bestsubset multiple regression using sm.OLS and itertools.combinations. I have added the constant, but because itertools.combinations loops through all column combinations, sometimes it excludes the constant term.

In order to get around this problem, I am trying to use itertools.combinations to always include that constant column in each other combination.

results only include some combinations including constant. How can I make it so every combination has the constant column in it?

example of what I am looking for:

[('const', 'B', 'C'), ('const', 'B', 'D'), ('const', 'B', 'E'), ('const', 'B', 'F'), ('const', 'A', 'B'),

Here is an example of what I currently have (image posted with results):

cols = ['A', 'B', 'C', 'D', 'E', 'F']
const = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


ran = np.random.rand(12, 6)
df = pd.DataFrame(data = ran, columns=cols)
df['const'] = const
results=[]
print(df)
for combo in itertools.combinations(df.columns, 3):
    results.append(combo)

print(results)

enter image description here


Solution

  • IIUC, you can do:

    for combo in itertools.combinations(df.columns[:-1], 2): # -1 because we want "const" column exclude
        results.append(["const", *combo])
    
    print(results)
    

    Prints:

    [
        ["const", "A", "B"],
        ["const", "A", "C"],
        ["const", "A", "D"],
        ["const", "A", "E"],
        ["const", "A", "F"],
        ["const", "B", "C"],
        ["const", "B", "D"],
        ["const", "B", "E"],
        ["const", "B", "F"],
        ["const", "C", "D"],
        ["const", "C", "E"],
        ["const", "C", "F"],
        ["const", "D", "E"],
        ["const", "D", "F"],
        ["const", "E", "F"],
    ]