Search code examples
rlinear-regressioncurve

Curve approximation from regression


I am trying to find the tangent vector of a curve. The equation of the curve is a question, I have different points, and based on these points I am looking for an approximation of the function, that describes the curve and fits the points.

When I plot my data it looks like this: enter image description here

After applying polynomial regression (based on this article: https://www.statology.org/curve-fitting-in-r/) I get the following result:

fit <- lm(cl2[,3] ~ poly(cl2[,2], 3))
    summary(fit)
Call:
lm(formula = cl2[, 3] ~ poly(cl2[, 2], 3))

Residuals:
     Min       1Q   Median       3Q      Max 
-0.31834 -0.10187  0.02132  0.09577  0.27393 

Coefficients:
                     Estimate Std. Error   t value Pr(>|t|)    
(Intercept)        -109.89121    0.03789 -2900.217  < 2e-16 ***
poly(cl2[, 2], 3)1    7.33365    0.16516    44.403  < 2e-16 ***
poly(cl2[, 2], 3)2   -4.43572    0.16516   -26.857 4.25e-14 ***
poly(cl2[, 2], 3)3    1.14772    0.16516     6.949 4.66e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1652 on 15 degrees of freedom
Multiple R-squared:  0.9946,    Adjusted R-squared:  0.9935 
F-statistic: 913.7 on 3 and 15 DF,  p-value: < 2.2e-16

When I fit the curve, the result looks ok:

lines(cl2[,2], predict(fit, data.frame(cl2[,2:3])))

enter image description here

Based on the coefficient s I would assume that the equation of the curve is:

1.14x**3-4.43x**2+7.33*x-109

When I calculate the y estimates I get very strange numbers:

y_actual:

1 -108.4569 -108.1504 -108.0895 -108.0728 -108.0461 -108.1777 -108.2751 -108.4619 -108.6918 [10] -108.9750 -109.3552 -109.7625 -110.3328 -110.9580 -111.4312 -112.0062 -112.7337 -113.5880 [19] -114.3681

y_predicted:

1 -8935267 -8980331 -9044297 -9115821 -9166614 -9270340 -9355643 -9456574 -9533497 -9602089 [11] -9631113 -9670175 -9715100 -9754453 -9798813 -9851816 -9880888 -9926067 -9940310

What is wrong here?

I tried setting the raw variable of poly function to TRUE and I got different coefficients, but the problem persists.


Edit

Data in dput format

y_actual <-
c(-108.4569, -108.1504, -108.0895, -108.0728, -108.0461, -108.1777, 
-108.2751, -108.4619, -108.6918, -108.975, -109.3552, -109.7625, 
-110.3328, -110.958, -111.4312, -112.0062, -112.7337, -113.588, 
-114.3681)

y_predicted <-
c(-8935267, -8980331, -9044297, -9115821, -9166614, -9270340, 
-9355643, -9456574, -9533497, -9602089, -9631113, -9670175, -9715100, 
-9754453, -9798813, -9851816, -9880888, -9926067, -9940310)

EDIT:

x_values <- c(-197.3419, -197.6753, -198.1467, -198.6710, -199.0418, -199.7946, -200.4095, -201.1323, -201.6797, -202.1654, -202.3702, -202.6451, -202.9605, -203.2359, -203.5455, -203.9142, -204.1158, -204.4285, -204.5268)

PROBLEM SOLUTION:

Thank you all for the valuable input. It helped a lot.

I came up with the following function:

#find tangent line
tangent_xy <- function(point_index, centerline){
  #fit the polynomial regression
  fit <- lm(centerline[,3] ~ poly(centerline[,2], 3, raw = T))
  # get coefficients
  cf <- fit$coefficients
  # equation of fitted curve
  (eq <- paste(sprintf('%s*x^%s', cf, seq_along(cf) - 1L), collapse='+'))
  # first derivative of fitted curve
  f <- D(parse(text = eq), "x")
  # calculate slope (value of derivative at given point)
  slope <- eval(f, envir = list(x = cl2[point_index,2]))
  #get coordinates of point
  x0 <- centerline[point_index, 2]
  y0 <- centerline[point_index, 3]
  # equation of tangent line
  y = slope*centerline[,2]-slope*x0+y0
  # points for plotting with lines function
  return(y)
}

Using the function like this:

curve(ff, min(cl2[,2]), max(cl2[,2]))
lines(cl2[,2], tangent_xy(3, cl2))
lines(cl2[,2], tangent_xy(12, cl2))
lines(cl2[,2], tangent_xy(15, cl2))
lines(cl2[,2], tangent_xy(7, cl2))

I get the following output:

Curve with tangents

It is not perfect, however I only need the approximation, so it will do. I will look into GAMs as Roland suggested. Maybe those will work better.


Solution

  • To generate equation from coefficients, use sprintf for power series. parsed into a function, we may plot it using curve over the points.

    > cf <- lm(cl2[, 3] ~ poly(cl2[, 2], 3, raw=TRUE)) |> coef()
    > (eq <- paste(sprintf('%s*x^%s', cf, seq_along(cf) - 1L), collapse='+'))
    [1] "-2007.84158109665*x^0+-24.0105660224912*x^1+-0.0949876187837792*x^2+-0.000111799887426291*x^3"
    > f <- eval(parse(text=paste('\\(x)', eq)))
    > curve(f, min(cl2[, 2]), max(cl2[, 2]), col=2, panel.first=points(cl2[, -1]))
    

    enter image description here


    Data:

    cl2 <- structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, -205, -204, -203, -202, -201, -200, -199, -198, -197, -196, 
    -195, -194, -193, -192, -191, -190, -189, -188, -187, -114.3681, 
    -113.588, -112.7337, -112.0062, -111.4312, -110.958, -110.3328, 
    -109.7625, -109.3552, -108.975, -108.6918, -108.4619, -108.2751, 
    -108.1777, -108.0461, -108.0728, -108.0895, -108.1504, -108.4569
    ), dim = c(19L, 3L), dimnames = list(NULL, c("", "x", "y")))