Search code examples
rcurve-fittingcurve

Vector projection to determine elbow in fitted curve


I have a graph that has been fitted with nonlinear regression. I would like to calculate the elbow in the fitted curve. The majority of 2nd differential methods fail to accurately capture this point, and a visual inspection seems to be the only recourse (not useful for automation). The closest thing to an automated "visual" approach would be to use vector projection to calculate the farthest data point from a line that connects the first and last points of the data set (see question mark below). Can this line, which is normal to the line connecting the first and last point, be calculated using R?

My nonlinear function is: result <- lm(y ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5), data = myData)

enter image description here


Solution

  • Give this a try. See if it works on your real data.

    library(MASS)
    
    # fake data
    x <- 5:300
    y <- (x - 0.03*x^2 + 0.02*x^3 + rnorm(length(x), sd=5000))/1000
    myData <- data.frame(x, y)
    
    # fitted curve (I used a simpler example)
    result <- lm(y ~ x + I(x^2) + I(x^3), data=myData)
    p <- fitted(result)
    
    # line connecting endpoints of fitted curve
    i1 <- which.min(x)
    i2 <- which.max(x)
    slope <- (p[i2] - p[i1]) / (x[i2] - x[i1])
    int <- p[i1] - slope*x[i1]
    
    # for every point on the predicted curve (xi, pi), the perpendicular line that goes through that point has
    perpslope <- -1/slope
    perpint <- p - perpslope*x
    
    # the intersection of the perp line(s) with the connecting line is
    xcross <- (int - perpint) / (perpslope - slope)
    ycross <- slope*xcross + int
    
    # the distance between the intersection and the point(s) is
    dists <- sqrt((x - xcross)^2 + (y - ycross)^2)
    
    # the index of the farthest point
    elbowi <- which.max(dists)
    
    # plot the data
    eqscplot(x, y)
    lines(x[c(i1, i2)], p[c(i1, i2)])
    points(x[elbowi], p[elbowi], pch=16, col="red")
    lines(x[order(x)], p[order(x)], col="blue")
    lines(c(x[elbowi], xcross[elbowi]), c(p[elbowi], ycross[elbowi]), col="red")