Search code examples
pythonlinear-regressionstatsmodels

Does it noise included in predicted and fitted value in statsmodel?


I am performing OLS regression with a country-specific effect (LSDV) on my panel data (dataframe). This is my result:

============================== OLSR With Dummies ==============================
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    ELC   R-squared:                       0.969
Model:                            OLS   Adj. R-squared:                  0.968
Method:                 Least Squares   F-statistic:                     1185.
Date:                Fri, 11 Mar 2022   Prob (F-statistic):               0.00
Time:                        10:13:02   Log-Likelihood:                -5120.2
No. Observations:                5237   AIC:                         1.051e+04
Df Residuals:                    5101   BIC:                         1.140e+04
Df Model:                         135                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.8980      0.130     22.355      0.000       2.644       3.152
IWW            0.6373      0.011     55.687      0.000       0.615       0.660
GDPC           0.7915      0.016     50.249      0.000       0.761       0.822
CDD            0.0333      0.007      4.750      0.000       0.020       0.047
HDD            0.1124      0.008     14.793      0.000       0.097       0.127
TIME           0.3588      0.013     28.110      0.000       0.334       0.384
AGO           -4.1382      0.147    -28.187      0.000      -4.426      -3.850
ALB           -7.4068      0.166    -44.670      0.000      -7.732      -7.082

I am obtaining fitted value by df_results.fittedvalues or df_results.predict(exog). To ensure that I am doing correct calculation, I want to compare manually calculated y, with y_fittedvalue, for example for ALB: y = 0.637*IWW + 0.7915*GDPC + 0.0333*CDD + 0.1124*HDD + 0.3588*TIME + (2.8980-7.4068), but it shows slight different (2~3%) (y=5.68 and y_fittedvalue=5.79). I guess it comes from noise (error), but I couldn't find any source and evidence for that. I have appreciated it if anyone can help to explain what does it cause this difference. And if it comes from noise, how can I get noise value?


Solution

  • For your manual calculation, you are using rounded coefficients. To do this precisely, you should be doing:

    df_results.params * exog