Search code examples
pythonstatisticsregressionstatsmodels

Estimating multiple parameters of a model in python


Wondering what's the most efficient/accurate way to estimate these parameters (a0, a1, a2, a3) with Python in the model:

col_4 = a0 + a1*col_1 + a2*col_2 + a3*col_3

The sample dataset would be:

inputs = {
    'col_1': np.random.normal(15,2,100),
    'col_2': np.random.normal(15,1,100),
    'col_3': np.random.normal(0.9,1,100),
    'col_4': np.random.normal(-0.05,0.5,100),
}

_idx = pd.date_range('2021-01-01','2021-04-10',freq='D').to_series()
data = pd.DataFrame(inputs, index = _idx) 

Solution

  • statsmodels provides a pretty simple way to estimate linear models like that:

    import statsmodels.formula.api as smf
    
    results = smf.ols('col_4 ~ col_1 + col_2 + col_3', data=data).fit()
    print(results.summary())
    

    The coef column shows your aX parameters:

                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:                  col_4   R-squared:                       0.049
    Model:                            OLS   Adj. R-squared:                  0.019
    Method:                 Least Squares   F-statistic:                     1.637
    Date:                Wed, 29 Dec 2021   Prob (F-statistic):              0.186
    Time:                        17:25:00   Log-Likelihood:                -68.490
    No. Observations:                 100   AIC:                             145.0
    Df Residuals:                      96   BIC:                             155.4
    Df Model:                           3                                         
    Covariance Type:            nonrobust                                         
    ==============================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
    ------------------------------------------------------------------------------
    Intercept      0.2191      0.846      0.259      0.796      -1.461       1.899
    col_1         -0.0198      0.023     -0.854      0.395      -0.066       0.026
    col_2         -0.0048      0.051     -0.093      0.926      -0.107       0.097
    col_3          0.1155      0.056      2.066      0.042       0.005       0.226
    ==============================================================================
    Omnibus:                        2.292   Durbin-Watson:                   2.291
    Prob(Omnibus):                  0.318   Jarque-Bera (JB):                2.296
    Skew:                          -0.351   Prob(JB):                        0.317
    Kurtosis:                       2.757   Cond. No.                         370.
    ==============================================================================
    
    Notes:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
    

    That includes the intercept (a0) by default. If you want to remove it, just add a -1 to the formula