Search code examples
rlinear-regressionweighted

Linear fit with a previously known slope


I'm trying to fit a linear function for whom I know the slope to this data in R:

> flN
   eCenter      eLow     eHigh     arrivalTime_4_8 timeError_4_8  vCenter     vLow    vHigh
1  56.4997  60.25967  52.73972 2012-05-17 02:01:44      41.01283 2.298964 2.236939 2.367869
2 105.5787 108.02476 103.13254 2012-05-17 01:57:37      27.06803 1.787009 1.771748 1.802860
3 164.0613 166.50385 161.61884 2012-05-17 01:49:47      60.42015 1.530334 1.522997 1.537860
4 226.9886 229.78374 224.19346 2012-05-17 01:49:20      50.83871 1.386016 1.381233 1.390904

My Y is arrivalTime_4_8 and the error in Y is timeError_4_8. My X is vCenter and the error vHigh and vLow for the upper and lower values.

So, have 3 problems about this (I'll put them in order of importance):

1 - How do I fit a linear function with a set slope? I tried to use lm, but I can't find a way to put the slope there:

lm(formula = arrivalTime_4_8 ~ vCenter, data = flN)

2 - I know I must use weights = 1/sqrt(error) to get a weighted fit, but how do I do that with a timeframe axis?

3 - How do I calculate the weights to use if I have the error measurements in 2 axis? A simple squared sum?

> dput(flN)
structure(list(eCenter = c(56.4996953402131, 105.578651367445, 
164.061343347697, 226.988601498086), eLow = c(60.2596687117468, 
108.024759195324, 166.503850666149, 229.78374170578), eHigh = c(52.7397219686794, 
103.132543539565, 161.618836029245, 224.193461290392), arrivalTime_4_8 = structure(c(1337216504.88343, 
1337216257.43636, 1337215787.45439, 1337215760.18075), tzone = "", class = c("POSIXct", 
"POSIXt")), timeError_4_8 = c(41.0128309582924, 27.0680250428967, 
60.4201539087451, 50.8387114506802), vCenter = c(2.29896400265163, 
1.78700866407098, 1.53033400915538, 1.38601563560752), vLow = c(2.23693860876912, 
1.77174836673712, 1.52299693760316, 1.38123278889559), vHigh = c(2.36786898717605, 
1.80286021104626, 1.5378599964982, 1.39090403398638)), .Names = c("eCenter", 
"eLow", "eHigh", "arrivalTime_4_8", "timeError_4_8", "vCenter", 
"vLow", "vHigh"), row.names = c(NA, -4L), class = "data.frame")

Solution

  • Responding to @jbssm comment, @BenBolker solution seems to give very reasonable results:

    vCenter_slope <- 600.7472
    flN <- transform(flN,w=1/as.numeric(timeError_4_8)^2)
    # slope fixed
    m1 <- lm(as.numeric(arrivalTime_4_8) ~ 1 + offset(vCenter*vCenter_slope), data = flN, weights=w)
    # slope chosen by lm(...)
    m0 <- lm(as.numeric(arrivalTime_4_8) ~ vCenter, data=flN, weights=w)
    
    library(ggplot2)
    ggp <- ggplot(flN)
    ggp <- ggp + geom_point(aes(x=vCenter,y=arrivalTime_4_8))
    ggp <- ggp + geom_errorbar(aes(x=vCenter, ymin=arrivalTime_4_8-timeError_4_8,ymax=arrivalTime_4_8+timeError_4_8),width=0)
    ggp <- ggp + geom_errorbarh(aes(x=vCenter, y=arrivalTime_4_8, xmax=vHigh, xmin=vLow),height=0)
    ggp <- ggp + geom_abline(slope=600.7472, intercept=coef(m1)[1])
    ggp <- ggp + geom_abline(slope=coef(m0)[2], intercept=coef(m0)[1],color="red")
    ggp
    

    The red line allows lm(...) to set the slope, the black line uses fixed slope. Note that the black line is further away from the first two vCenter points than you might expect, because you're weighting inversely to error in vCenter. If this is your whole dataset, then using weights is highly questionable.

    Finally, you can read up on "errors-in-variables" models here and here. After reading these, you might want to look at the MethComp package, which supports `Deming Regression' (one type of errors-in-variables regression).