I'm trying to fit a linear function for whom I know the slope to this data in R:
> flN
eCenter eLow eHigh arrivalTime_4_8 timeError_4_8 vCenter vLow vHigh
1 56.4997 60.25967 52.73972 2012-05-17 02:01:44 41.01283 2.298964 2.236939 2.367869
2 105.5787 108.02476 103.13254 2012-05-17 01:57:37 27.06803 1.787009 1.771748 1.802860
3 164.0613 166.50385 161.61884 2012-05-17 01:49:47 60.42015 1.530334 1.522997 1.537860
4 226.9886 229.78374 224.19346 2012-05-17 01:49:20 50.83871 1.386016 1.381233 1.390904
My Y is arrivalTime_4_8
and the error in Y is timeError_4_8
.
My X is vCenter
and the error vHigh
and vLow
for the upper and lower values.
So, have 3 problems about this (I'll put them in order of importance):
1 - How do I fit a linear function with a set slope? I tried to use lm
, but I can't find a way to put the slope there:
lm(formula = arrivalTime_4_8 ~ vCenter, data = flN)
2 - I know I must use weights = 1/sqrt(error) to get a weighted fit, but how do I do that with a timeframe axis?
3 - How do I calculate the weights to use if I have the error measurements in 2 axis? A simple squared sum?
> dput(flN)
structure(list(eCenter = c(56.4996953402131, 105.578651367445,
164.061343347697, 226.988601498086), eLow = c(60.2596687117468,
108.024759195324, 166.503850666149, 229.78374170578), eHigh = c(52.7397219686794,
103.132543539565, 161.618836029245, 224.193461290392), arrivalTime_4_8 = structure(c(1337216504.88343,
1337216257.43636, 1337215787.45439, 1337215760.18075), tzone = "", class = c("POSIXct",
"POSIXt")), timeError_4_8 = c(41.0128309582924, 27.0680250428967,
60.4201539087451, 50.8387114506802), vCenter = c(2.29896400265163,
1.78700866407098, 1.53033400915538, 1.38601563560752), vLow = c(2.23693860876912,
1.77174836673712, 1.52299693760316, 1.38123278889559), vHigh = c(2.36786898717605,
1.80286021104626, 1.5378599964982, 1.39090403398638)), .Names = c("eCenter",
"eLow", "eHigh", "arrivalTime_4_8", "timeError_4_8", "vCenter",
"vLow", "vHigh"), row.names = c(NA, -4L), class = "data.frame")
Responding to @jbssm comment, @BenBolker solution seems to give very reasonable results:
vCenter_slope <- 600.7472
flN <- transform(flN,w=1/as.numeric(timeError_4_8)^2)
# slope fixed
m1 <- lm(as.numeric(arrivalTime_4_8) ~ 1 + offset(vCenter*vCenter_slope), data = flN, weights=w)
# slope chosen by lm(...)
m0 <- lm(as.numeric(arrivalTime_4_8) ~ vCenter, data=flN, weights=w)
library(ggplot2)
ggp <- ggplot(flN)
ggp <- ggp + geom_point(aes(x=vCenter,y=arrivalTime_4_8))
ggp <- ggp + geom_errorbar(aes(x=vCenter, ymin=arrivalTime_4_8-timeError_4_8,ymax=arrivalTime_4_8+timeError_4_8),width=0)
ggp <- ggp + geom_errorbarh(aes(x=vCenter, y=arrivalTime_4_8, xmax=vHigh, xmin=vLow),height=0)
ggp <- ggp + geom_abline(slope=600.7472, intercept=coef(m1)[1])
ggp <- ggp + geom_abline(slope=coef(m0)[2], intercept=coef(m0)[1],color="red")
ggp
The red line allows lm(...)
to set the slope, the black line uses fixed slope. Note that the black line is further away from the first two vCenter
points than you might expect, because you're weighting inversely to error in vCenter
. If this is your whole dataset, then using weights is highly questionable.
Finally, you can read up on "errors-in-variables" models here and here. After reading these, you might want to look at the MethComp package, which supports `Deming Regression' (one type of errors-in-variables regression).