Search code examples
numpystatisticsregressionlinear-regressionstatsmodels

Different intercept values for linear regression using statsmodels and numpy polyfit


I get two different intercept values from using the statsmodels regression fit and the numpy polyfit. The model is a simple linear regression with a single variable.

From the statsmodels regression I use:

results1 = smf.ols('np.log(NON_UND) ~ (np.log(Food_consumption))', data=Data2).fit()

Where I recieve the following results:

                               coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------------
Intercept                    5.4433      0.270     20.154      0.000       4.911       5.976
np.log(Food_consumption)     1.1128      0.026     42.922      0.000       1.062       1.164

When plotting the data and adding a trendline using numpy polyfit, I recieve a different intercept value:

x = np.array((np.log(Data2.Food_consumption)))
y = np.array((np.log(Data2.NON_UND)*100))

z = np.polyfit(x, y, 1)

array([ 1.11278898, 10.04846693])

How come I get two different values for the intercept?

Thanks in advance!


Solution

  • This is because you are using different linear models in the first and second regressions. In the first regression, you take logs of both the dependent and independent variables, while in the second regression, you are not, and additionally, you are multiplying y by 100.

    In order to get the same results as the first regression in the second specification, you need to make sure the regression model is exactly the same as the first one. I suggest you do this:

    x = np.log(np.array(((Data2.Food_consumption))))
    y = np.log(np.array(((Data2.NON_UND))))
    
    z = np.polyfit(x, y, 1)
    

    And then the output you get with the second function should be the same as the one you get in the first one.