I am trying to find the tangent vector of a curve. The equation of the curve is a question, I have different points, and based on these points I am looking for an approximation of the function, that describes the curve and fits the points.
When I plot my data it looks like this:
After applying polynomial regression (based on this article: https://www.statology.org/curve-fitting-in-r/) I get the following result:
fit <- lm(cl2[,3] ~ poly(cl2[,2], 3))
summary(fit)
Call:
lm(formula = cl2[, 3] ~ poly(cl2[, 2], 3))
Residuals:
Min 1Q Median 3Q Max
-0.31834 -0.10187 0.02132 0.09577 0.27393
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -109.89121 0.03789 -2900.217 < 2e-16 ***
poly(cl2[, 2], 3)1 7.33365 0.16516 44.403 < 2e-16 ***
poly(cl2[, 2], 3)2 -4.43572 0.16516 -26.857 4.25e-14 ***
poly(cl2[, 2], 3)3 1.14772 0.16516 6.949 4.66e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1652 on 15 degrees of freedom
Multiple R-squared: 0.9946, Adjusted R-squared: 0.9935
F-statistic: 913.7 on 3 and 15 DF, p-value: < 2.2e-16
When I fit the curve, the result looks ok:
lines(cl2[,2], predict(fit, data.frame(cl2[,2:3])))
Based on the coefficient s I would assume that the equation of the curve is:
1.14x**3-4.43x**2+7.33*x-109
When I calculate the y estimates I get very strange numbers:
y_actual:
1 -108.4569 -108.1504 -108.0895 -108.0728 -108.0461 -108.1777 -108.2751 -108.4619 -108.6918 [10] -108.9750 -109.3552 -109.7625 -110.3328 -110.9580 -111.4312 -112.0062 -112.7337 -113.5880 [19] -114.3681
y_predicted:
1 -8935267 -8980331 -9044297 -9115821 -9166614 -9270340 -9355643 -9456574 -9533497 -9602089 [11] -9631113 -9670175 -9715100 -9754453 -9798813 -9851816 -9880888 -9926067 -9940310
What is wrong here?
I tried setting the raw variable of poly function to TRUE and I got different coefficients, but the problem persists.
Data in dput
format
y_actual <-
c(-108.4569, -108.1504, -108.0895, -108.0728, -108.0461, -108.1777,
-108.2751, -108.4619, -108.6918, -108.975, -109.3552, -109.7625,
-110.3328, -110.958, -111.4312, -112.0062, -112.7337, -113.588,
-114.3681)
y_predicted <-
c(-8935267, -8980331, -9044297, -9115821, -9166614, -9270340,
-9355643, -9456574, -9533497, -9602089, -9631113, -9670175, -9715100,
-9754453, -9798813, -9851816, -9880888, -9926067, -9940310)
EDIT:
x_values <- c(-197.3419, -197.6753, -198.1467, -198.6710, -199.0418, -199.7946, -200.4095, -201.1323, -201.6797, -202.1654, -202.3702, -202.6451, -202.9605, -203.2359, -203.5455, -203.9142, -204.1158, -204.4285, -204.5268)
PROBLEM SOLUTION:
Thank you all for the valuable input. It helped a lot.
I came up with the following function:
#find tangent line
tangent_xy <- function(point_index, centerline){
#fit the polynomial regression
fit <- lm(centerline[,3] ~ poly(centerline[,2], 3, raw = T))
# get coefficients
cf <- fit$coefficients
# equation of fitted curve
(eq <- paste(sprintf('%s*x^%s', cf, seq_along(cf) - 1L), collapse='+'))
# first derivative of fitted curve
f <- D(parse(text = eq), "x")
# calculate slope (value of derivative at given point)
slope <- eval(f, envir = list(x = cl2[point_index,2]))
#get coordinates of point
x0 <- centerline[point_index, 2]
y0 <- centerline[point_index, 3]
# equation of tangent line
y = slope*centerline[,2]-slope*x0+y0
# points for plotting with lines function
return(y)
}
Using the function like this:
curve(ff, min(cl2[,2]), max(cl2[,2]))
lines(cl2[,2], tangent_xy(3, cl2))
lines(cl2[,2], tangent_xy(12, cl2))
lines(cl2[,2], tangent_xy(15, cl2))
lines(cl2[,2], tangent_xy(7, cl2))
I get the following output:
It is not perfect, however I only need the approximation, so it will do. I will look into GAMs as Roland suggested. Maybe those will work better.
To generate equation from coef
ficients, use sprintf
for power series. parse
d into a function, we may plot it using curve
over the points
.
> cf <- lm(cl2[, 3] ~ poly(cl2[, 2], 3, raw=TRUE)) |> coef()
> (eq <- paste(sprintf('%s*x^%s', cf, seq_along(cf) - 1L), collapse='+'))
[1] "-2007.84158109665*x^0+-24.0105660224912*x^1+-0.0949876187837792*x^2+-0.000111799887426291*x^3"
> f <- eval(parse(text=paste('\\(x)', eq)))
> curve(f, min(cl2[, 2]), max(cl2[, 2]), col=2, panel.first=points(cl2[, -1]))
Data:
cl2 <- structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, -205, -204, -203, -202, -201, -200, -199, -198, -197, -196,
-195, -194, -193, -192, -191, -190, -189, -188, -187, -114.3681,
-113.588, -112.7337, -112.0062, -111.4312, -110.958, -110.3328,
-109.7625, -109.3552, -108.975, -108.6918, -108.4619, -108.2751,
-108.1777, -108.0461, -108.0728, -108.0895, -108.1504, -108.4569
), dim = c(19L, 3L), dimnames = list(NULL, c("", "x", "y")))