Search code examples
pythonnumpyscikit-learnlinear-regressionstatsmodels

Calculating error of linear regression coefficient, given errors of y


I am processing my lab measurements related to measuring the speed of sound. To put my goal simply, I have a series of measurements y(x) as follows:

x       y
0       0
1     212
2     426
3     640
4     858
5    1074
6    1290
7    1506
8    1722
9    1939

And also I know the measurements of y may be off by 2. So, for example, with x = 1, y could be anywhere from 210 to 214. I wanna know how much impact this error has on the coefficients of linear regression.

I was using sklearn LinearRegression and with fit_intercept=False parameter the task wasn't so hard. I just needed to calculate the coefficient for series y - 2 and y + 2 and get the difference. But then I have to do a similar task without fit_intercept=False (so y is not 0 when x is 0).

So I am wondering are there any officially implemented ways to achieve my goal? Not necessarily in sklearn.


Solution

  • The slope coefficient m in y = mx + c is found below. (I suspect that you only need the slope to get the speed of sound from your data.)

    (Case 1) If non-zero intercept c is allowed then the slope is:

    enter image description here

    and the denominator is positive. (It is N times the variance of x).

    To get the MAXIMUM slope you want to maximize:

    enter image description here

    So, take the greatest possible value of y if x is greater than x_mean and the smallest value of y if x is less than x_mean.

    To get the MINIMUM slope then minimize the numerator by doing the reverse.

    (Case 2) If the intercept c is forced to be zero (the line has to go through the origin) then the slope is:

    enter image description here

    Since the x values are fixed then maximize the slope by taking the largest possible value of y where x is positive and the smallest possible value when x is negative. Again, do the reverse to get the minimum slope.