Search code examples
rauc

How to calculate area under curve in R?


I have data like in the following image and I want to calculate the area under the curve between the blue lines x = 5.75 and x = 6.45:

enter image description here

I have tried some of the answers given here and here. pracma::trapz does not allow me to specify lower and upper limits on x:

pracma::trapz(x = df$X, y = df$Y)
## [1] 5809.75

MESS::auc does support lower and upper limits on x, and so does integrate from base R:

MESS::auc(x = df$X, y = df$Y, from = 5.75, to = 6.45, type = "spline")
## [1] 328.043

integrate(approxfun(df$X, df$Y), lower = 5.75, upper = 6.45)
## 327.8377 with absolute error < 0.03

But I suspect that these two functions are calculating the area like this:

enter image description here

I only want the area up to the red line shown here:

enter image description here

Here is the data:

df <- 
structure(list(X = c(4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 
4.9, 5, 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8, 5.9, 6, 6.1, 
6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, 7, 7.1, 7.2, 7.3, 7.4, 
7.5, 7.6, 7.7, 7.8, 7.9, 8, 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7, 
8.8, 8.9, 9, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9, 10, 
10.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 11, 11.1, 
11.2, 11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 12, 12.1, 12.2, 
12.3, 12.4, 12.5, 12.6, 12.7, 12.8, 12.9, 13, 13.1, 13.2, 13.3, 
13.4, 13.5, 13.6, 13.7, 13.8, 13.9, 14, 14.1, 14.2, 14.3, 14.4, 
14.5, 14.6, 14.7, 14.8, 14.9, 15, 15.1, 15.2, 15.3, 15.4, 15.5, 
15.6, 15.7, 15.8, 15.9, 16, 16.1, 16.2, 16.3, 16.4, 16.5, 16.6, 
16.7, 16.8, 16.9, 17, 17.1, 17.2, 17.3, 17.4, 17.5, 17.6, 17.7, 
17.8, 17.9, 18, 18.1, 18.2, 18.3, 18.4, 18.5, 18.6, 18.7, 18.8, 
18.9, 19, 19.1, 19.2, 19.3, 19.4, 19.5, 19.6, 19.7, 19.8, 19.9, 
20, 20.1, 20.2, 20.3, 20.4, 20.5, 20.6, 20.7, 20.8, 20.9, 21, 
21.1, 21.2, 21.3, 21.4, 21.5, 21.6, 21.7, 21.8, 21.9, 22, 22.1, 
22.2, 22.3, 22.4, 22.5, 22.6, 22.7, 22.8, 22.9, 23, 23.1, 23.2, 
23.3, 23.4, 23.5, 23.6, 23.7, 23.8, 23.9, 24, 24.1, 24.2, 24.3, 
24.4, 24.5, 24.6, 24.7, 24.8, 24.9, 25, 25.1, 25.2, 25.3, 25.4, 
25.5, 25.6, 25.7, 25.8, 25.9, 26, 26.1, 26.2, 26.3, 26.4, 26.5, 
26.6, 26.7, 26.8, 26.9, 27, 27.1, 27.2, 27.3, 27.4, 27.5, 27.6, 
27.7, 27.8, 27.9, 28, 28.1, 28.2, 28.3, 28.4, 28.5, 28.6, 28.7, 
28.8, 28.9, 29, 29.1, 29.2, 29.3, 29.4, 29.5, 29.6, 29.7, 29.8, 
29.9, 30), Y = c(625, 548, 586, 552, 557, 586, 552, 511, 529, 
506, 529, 497, 462, 484, 467, 471, 441, 462, 475, 552, 511, 471, 
416, 396, 380, 361, 328, 350, 388, 365, 303, 328, 357, 346, 320, 
317, 346, 339, 320, 376, 357, 361, 346, 400, 420, 433, 497, 449, 
388, 372, 361, 346, 342, 299, 279, 282, 306, 306, 289, 253, 266, 
259, 262, 237, 253, 237, 250, 234, 219, 231, 219, 243, 246, 204, 
225, 202, 207, 202, 219, 193, 216, 262, 286, 272, 216, 199, 193, 
185, 154, 154, 182, 169, 149, 144, 180, 154, 164, 139, 137, 139, 
137, 154, 144, 156, 142, 146, 159, 119, 137, 132, 151, 132, 128, 
132, 149, 119, 154, 151, 144, 144, 149, 161, 125, 149, 149, 156, 
139, 135, 142, 146, 130, 169, 132, 169, 149, 164, 216, 202, 188, 
166, 177, 164, 172, 182, 154, 188, 174, 196, 154, 149, 166, 135, 
144, 144, 144, 135, 137, 135, 146, 169, 137, 139, 123, 123, 137, 
137, 119, 149, 144, 132, 125, 119, 123, 135, 130, 123, 130, 130, 
142, 139, 132, 130, 123, 123, 121, 121, 121, 164, 121, 130, 130, 
146, 137, 146, 117, 139, 144, 130, 132, 144, 177, 159, 144, 161, 
172, 144, 169, 193, 222, 282, 272, 246, 207, 213, 196, 210, 234, 
204, 219, 213, 234, 256, 216, 259, 250, 276, 324, 313, 262, 213, 
204, 185, 164, 180, 164, 182, 169, 166, 151, 144, 128, 119, 146, 
137, 121, 164, 121, 144, 128, 128, 144, 135, 121, 139, 128, 144, 
130, 149, 119, 0)), row.names = c(NA, 260L), class = "data.frame")

Solution

  • Your earlier attempt was correct, except that you used x instead of X and y instead of Y.

    integrate(approxfun(df$X, df$Y), lower = 5.75, upper = 6.45)
    327.8377 with absolute error < 0.03
    

    Update to respond to updated question

    To get the area of the trapezoidal region in your update, you just need the height at the endpoints.

    F = approxfun(df$X, df$Y)
    ## Area
    (F(5.75) + F(6.45))/2 *(6.45-5.75)
    [1] 293.825