Search code examples
pythonregressionstatsmodels

Naming explanatory variables in regression output


Each one of my variables is a list on its own.

I am using a method found on another thread here.

import numpy as np
import statsmodels.api as sm

y = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4]

x = [
     [4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5],
     [4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5],
     [4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4]
     ]

def reg_m(y, x):
    ones = np.ones(len(x[0]))
    X = sm.add_constant(np.column_stack((x[0], ones)))
    for ele in x[1:]:
        X = sm.add_constant(np.column_stack((ele, X)))
    results = sm.OLS(y, X).fit()
    return results

My only problem being, that in my regression output, the explanatory variables are labelled x1, x2, x3 etc. Was wondering if it was possible to change these to more meaningful names?

Thanks


Solution

  • Searching through the source, it appears the summary() method does support using your own names for explanatory variables. So:

    results = sm.OLS(y, X).fit()
    print results.summary(xname=['Fred', 'Mary', 'Ethel', 'Bob'])
    

    gives us:

                                    OLS Regression Results
    ==============================================================================
    Dep. Variable:                      y   R-squared:                       0.535
    Model:                            OLS   Adj. R-squared:                  0.461
    Method:                 Least Squares   F-statistic:                     7.281
    Date:                Mon, 11 Apr 2016   Prob (F-statistic):            0.00191
    Time:                        22:22:47   Log-Likelihood:                -26.025
    No. Observations:                  23   AIC:                             60.05
    Df Residuals:                      19   BIC:                             64.59
    Df Model:                           3
    Covariance Type:            nonrobust
    ==============================================================================
                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
    ------------------------------------------------------------------------------
    Fred           0.2424      0.139      1.739      0.098        -0.049     0.534
    Mary           0.2360      0.149      1.587      0.129        -0.075     0.547
    Ethel         -0.0618      0.145     -0.427      0.674        -0.365     0.241
    Bob            1.5704      0.633      2.481      0.023         0.245     2.895
    ==============================================================================
    Omnibus:                        6.904   Durbin-Watson:                   1.905
    Prob(Omnibus):                  0.032   Jarque-Bera (JB):                4.708
    Skew:                          -0.849   Prob(JB):                       0.0950
    Kurtosis:                       4.426   Cond. No.                         38.6
    ==============================================================================
    
    Warnings:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.