Search code examples
rplotdistance

calculate distance between regression line and datapoint


I wonder if there is a way to calculate the distance between a abline in a plot and a datapoint? For example, what is the distance between concentration == 40 with signal == 643 (element 5) and the abline?

concentration <- c(1,10,20,30,40,50)
signal <- c(4, 22, 44, 244, 643, 1102)
plot(concentration, signal)
res <- lm(signal ~ concentration)
abline(res)

Solution

  • You are basically asking for the residuals.

    R> residuals(res)
          1       2       3       4       5       6 
     192.61   12.57 -185.48 -205.52  -26.57  212.39 
    

    As an aside, when you fit a linear regression, the sum of the residuals is 0:

    R> sum(residuals(res))
    [1] 8.882e-15
    

    and if the model is correct, should follow a Normal distribution - qqnorm(res).

    I find working with the standardised residuals easier.

    > rstandard(res)
           1        2        3        4        5        6 
     1.37707  0.07527 -1.02653 -1.13610 -0.15845  1.54918 
    

    These residuals have been scaled to have mean zero, variance (approximately) equal to one and have a Normal distribution. Outlying standardised residuals are those larger that +/- 2.