Search code examples
pythonpandasstatsmodels

statsmodels: printing summary of more than one regression models together


In the Python library Statsmodels, you can print out the regression results with print(results.summary()), how can I print out the summary of more than one regressions in one table, for better comparison?

A linear regression, code taken from statsmodels documentation:

nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([0.1, 10])
e = np.random.normal(size=nsample)
y = np.dot(X, beta) + e

model = sm.OLS(y, X)
results_noconstant = model.fit()

Then I add a constant to the model and run the regression again:

beta = np.array([1, 0.1, 10])
X = sm.add_constant(X)
y = np.dot(X, beta) + e 

model = sm.OLS(y, X)
results_withconstant = model.fit()

I'd like to see the summaries of results_noconstant and results_withconstant printed out in one table. This should be a very useful function, but I didn't find any instruction about this in the statsmodels documentation.

EDIT: The regression table I had in mind would be something like this, I wonder whether there is ready-made functionality to do this.


Solution

  • I am sure there are number of ways to do that. Depends on what you can / want use to achieve that.

    The starting point most likely will be the same:

    statsmodels 'linear_model'.fit() returns RegressionResults class, which has summary2() method returning subclass with a few convenice methods.

    One of which, for example, .tables returns pandas.DataFrame.

    Here is how you could use this:

    import pandas as pd 
    results = {'Noconst':results_noconstant.summary2(), 
               'withcon':results_withconstant.summary2()}
    df = pd.DataFrame({'Model':[], 'Param':[], 'Value':[]})
    for mod in results.keys():
        for col in results[mod].tables[0].columns:
            if col % 2 == 0: 
                df = df.append(pd.DataFrame({'Model': [mod]*results[mod].tables[0][col].size,
                                             'Param':results[mod].tables[0][col].values, 
                                             'Value':results[mod].tables[0][col+1].values}))
    
    print df
    

    Which yields:

         Model                Param             Value
    0  Noconst               Model:               OLS
    1  Noconst  Dependent Variable:                 y
    2  Noconst                Date:  2016-01-29 00:33
    3  Noconst    No. Observations:               100
    4  Noconst            Df Model:                 2
    5  Noconst        Df Residuals:                98
    6  Noconst           R-squared:             1.000
    0  Noconst      Adj. R-squared:             1.000
    1  Noconst                 AIC:          296.0102
    2  Noconst                 BIC:          301.2205
    3  Noconst      Log-Likelihood:           -146.01
    4  Noconst         F-statistic:         9.182e+06
    5  Noconst  Prob (F-statistic):         4.33e-259
    6  Noconst               Scale:            1.1079
    0  withcon               Model:               OLS
    1  withcon  Dependent Variable:                 y
    2  withcon                Date:  2016-01-29 00:33
    3  withcon    No. Observations:               100
    4  withcon            Df Model:                 2
    5  withcon        Df Residuals:                97
    6  withcon           R-squared:             1.000
    0  withcon      Adj. R-squared:             1.000
    1  withcon                 AIC:          297.8065
    2  withcon                 BIC:          305.6220
    3  withcon      Log-Likelihood:           -145.90
    4  withcon         F-statistic:         4.071e+06
    5  withcon  Prob (F-statistic):         1.55e-239
    6  withcon               Scale:            1.1170
    

    What you can do with this is only limited by your ability to use pandas - powerful Python data analysis toolkit.