I'm looking for an efficient function to automatically produce betas for every possible multiple regression model given a dependent variable and set of predictors as a DataFrame in python.
For example, given this set of data:
https://i.sstatic.net/YuPuv.jpg
The dependent variable is 'Cases per Capita' and the columns following are the predictor variables.
In a simpler example:
Student Grade Hours Slept Hours Studied ...
--------- -------- ------------- --------------- -----
A 90 9 1 ...
B 85 7 2 ...
C 100 4 5 ...
... ... ... ... ...
where the beta matrix output would look as such:
Regression Hours Slept Hours Studied
------------ ------------- ---------------
1 # N/A
2 N/A #
3 # #
The table size would be [2^n - 1]
where n
is the number of variables, so in the case with 5 predictors and 1 dependent, there would be 31 regressions, each with a different possible combination of beta
calculations.
The process is described in greater detail here and an actual solution that is written in R is posted here.
I am not aware of any package that already does this. But you can create all those combinations (2^n-1), where n is the number of columns in X (independent variables), and fit a linear regression model for each combination and then get coefficients/betas for each model.
Here is how I would do it, hope this helps
from sklearn import datasets, linear_model
import numpy as np
from itertools import combinations
#test dataset
X, y = datasets.load_boston(return_X_y=True)
X = X[:,:3] # Orginal X has 13 columns, only taking n=3 instead of 13 columns
#create all 2^n-1 (here 7 because n=3) combinations of columns, where n is the number of features/indepdent variables
all_combs = []
for i in range(X.shape[1]):
all_combs.extend(combinations(range(X.shape[1]),i+1))
# print 2^n-1 combinations
print('2^n-1 combinations are:')
print(all_combs)
## Create a betas/coefficients as zero matrix with rows (2^n-1) and columns equal to X
betas = np.zeros([len(all_combs), X.shape[1]])+np.NaN
## Fit a model for each combination of columns and add the coefficients into betas matrix
lr = linear_model.LinearRegression()
for regression_no, comb in enumerate(all_combs):
lr.fit(X[:,comb], y)
betas[regression_no, comb] = lr.coef_
## Print Coefficients of each model
print('Regression No'.center(15)+" ".join(['column {}'.format(i).center(10) for i in range(X.shape[1])]))
print('_'*50)
for index, beta in enumerate(betas):
print('{}'.format(index + 1).center(15), " ".join(['{:.4f}'.format(beta[i]).center(10) for i in range(X.shape[1])]))
results in
2^n-1 combinations are:
[(0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]
Regression No column 0 column 1 column 2
__________________________________________________
1 -0.4152 nan nan
2 nan 0.1421 nan
3 nan nan -0.6485
4 -0.3521 0.1161 nan
5 -0.2455 nan -0.5234
6 nan 0.0564 -0.5462
7 -0.2486 0.0585 -0.4156