I am processing my lab measurements related to measuring the speed of sound. To put my goal simply, I have a series of measurements y(x)
as follows:
x y
0 0
1 212
2 426
3 640
4 858
5 1074
6 1290
7 1506
8 1722
9 1939
And also I know the measurements of y
may be off by 2. So, for example, with x = 1
, y
could be anywhere from 210 to 214. I wanna know how much impact this error has on the coefficients of linear regression.
I was using sklearn
LinearRegression
and with fit_intercept=False
parameter the task wasn't so hard. I just needed to calculate the coefficient for series y - 2
and y + 2
and get the difference. But then I have to do a similar task without fit_intercept=False
(so y
is not 0 when x
is 0).
So I am wondering are there any officially implemented ways to achieve my goal? Not necessarily in sklearn
.
The slope coefficient m
in y = mx + c
is found below. (I suspect that you only need the slope to get the speed of sound from your data.)
(Case 1) If non-zero intercept c is allowed then the slope is:
and the denominator is positive. (It is N times the variance of x).
To get the MAXIMUM slope you want to maximize:
So, take the greatest possible value of y
if x
is greater than x_mean
and the smallest value of y
if x
is less than x_mean.
To get the MINIMUM slope then minimize the numerator by doing the reverse.
(Case 2) If the intercept c
is forced to be zero (the line has to go through the origin) then the slope is:
Since the x
values are fixed then maximize the slope by taking the largest possible value of y
where x
is positive and the smallest possible value when x
is negative. Again, do the reverse to get the minimum slope.