I have data like in the following image and I want to calculate the area under the curve between the blue lines x = 5.75
and x = 6.45
:
I have tried some of the answers given here and here. pracma::trapz
does not allow me to specify lower and upper limits on x
:
pracma::trapz(x = df$X, y = df$Y)
## [1] 5809.75
MESS::auc
does support lower and upper limits on x
, and so does integrate
from base R:
MESS::auc(x = df$X, y = df$Y, from = 5.75, to = 6.45, type = "spline")
## [1] 328.043
integrate(approxfun(df$X, df$Y), lower = 5.75, upper = 6.45)
## 327.8377 with absolute error < 0.03
But I suspect that these two functions are calculating the area like this:
I only want the area up to the red line shown here:
Here is the data:
df <-
structure(list(X = c(4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8,
4.9, 5, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6, 6.1,
6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7, 7.1, 7.2, 7.3, 7.4,
7.5, 7.6, 7.7, 7.8, 7.9, 8, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7,
8.8, 8.9, 9, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10,
10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 11, 11.1,
11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 12, 12.1, 12.2,
12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9, 13, 13.1, 13.2, 13.3,
13.4, 13.5, 13.6, 13.7, 13.8, 13.9, 14, 14.1, 14.2, 14.3, 14.4,
14.5, 14.6, 14.7, 14.8, 14.9, 15, 15.1, 15.2, 15.3, 15.4, 15.5,
15.6, 15.7, 15.8, 15.9, 16, 16.1, 16.2, 16.3, 16.4, 16.5, 16.6,
16.7, 16.8, 16.9, 17, 17.1, 17.2, 17.3, 17.4, 17.5, 17.6, 17.7,
17.8, 17.9, 18, 18.1, 18.2, 18.3, 18.4, 18.5, 18.6, 18.7, 18.8,
18.9, 19, 19.1, 19.2, 19.3, 19.4, 19.5, 19.6, 19.7, 19.8, 19.9,
20, 20.1, 20.2, 20.3, 20.4, 20.5, 20.6, 20.7, 20.8, 20.9, 21,
21.1, 21.2, 21.3, 21.4, 21.5, 21.6, 21.7, 21.8, 21.9, 22, 22.1,
22.2, 22.3, 22.4, 22.5, 22.6, 22.7, 22.8, 22.9, 23, 23.1, 23.2,
23.3, 23.4, 23.5, 23.6, 23.7, 23.8, 23.9, 24, 24.1, 24.2, 24.3,
24.4, 24.5, 24.6, 24.7, 24.8, 24.9, 25, 25.1, 25.2, 25.3, 25.4,
25.5, 25.6, 25.7, 25.8, 25.9, 26, 26.1, 26.2, 26.3, 26.4, 26.5,
26.6, 26.7, 26.8, 26.9, 27, 27.1, 27.2, 27.3, 27.4, 27.5, 27.6,
27.7, 27.8, 27.9, 28, 28.1, 28.2, 28.3, 28.4, 28.5, 28.6, 28.7,
28.8, 28.9, 29, 29.1, 29.2, 29.3, 29.4, 29.5, 29.6, 29.7, 29.8,
29.9, 30), Y = c(625, 548, 586, 552, 557, 586, 552, 511, 529,
506, 529, 497, 462, 484, 467, 471, 441, 462, 475, 552, 511, 471,
416, 396, 380, 361, 328, 350, 388, 365, 303, 328, 357, 346, 320,
317, 346, 339, 320, 376, 357, 361, 346, 400, 420, 433, 497, 449,
388, 372, 361, 346, 342, 299, 279, 282, 306, 306, 289, 253, 266,
259, 262, 237, 253, 237, 250, 234, 219, 231, 219, 243, 246, 204,
225, 202, 207, 202, 219, 193, 216, 262, 286, 272, 216, 199, 193,
185, 154, 154, 182, 169, 149, 144, 180, 154, 164, 139, 137, 139,
137, 154, 144, 156, 142, 146, 159, 119, 137, 132, 151, 132, 128,
132, 149, 119, 154, 151, 144, 144, 149, 161, 125, 149, 149, 156,
139, 135, 142, 146, 130, 169, 132, 169, 149, 164, 216, 202, 188,
166, 177, 164, 172, 182, 154, 188, 174, 196, 154, 149, 166, 135,
144, 144, 144, 135, 137, 135, 146, 169, 137, 139, 123, 123, 137,
137, 119, 149, 144, 132, 125, 119, 123, 135, 130, 123, 130, 130,
142, 139, 132, 130, 123, 123, 121, 121, 121, 164, 121, 130, 130,
146, 137, 146, 117, 139, 144, 130, 132, 144, 177, 159, 144, 161,
172, 144, 169, 193, 222, 282, 272, 246, 207, 213, 196, 210, 234,
204, 219, 213, 234, 256, 216, 259, 250, 276, 324, 313, 262, 213,
204, 185, 164, 180, 164, 182, 169, 166, 151, 144, 128, 119, 146,
137, 121, 164, 121, 144, 128, 128, 144, 135, 121, 139, 128, 144,
130, 149, 119, 0)), row.names = c(NA, 260L), class = "data.frame")
Your earlier attempt was correct, except that you used x instead of X and y instead of Y.
integrate(approxfun(df$X, df$Y), lower = 5.75, upper = 6.45)
327.8377 with absolute error < 0.03
Update to respond to updated question
To get the area of the trapezoidal region in your update, you just need the height at the endpoints.
F = approxfun(df$X, df$Y)
## Area
(F(5.75) + F(6.45))/2 *(6.45-5.75)
[1] 293.825