Search code examples
rnumerical-methodsauc

Area Under the Curve using Simpson's rule in R


I would like to compute the Area Under the Curve defined by a set of experimental values. I created a function to calculate an aproximation of the AUC using the Simpson's rule as I saw in this post. However, the function only works when it receives a vector of odd length. How can I modify the code to add the area of the last trapezoid when the input vector has an even length.

AUC <- function(x, h=1){
  # AUC function computes the Area Under the Curve of a time serie using
  # the Simpson's Rule (numerical method).
  # https://link.springer.com/chapter/10.1007/978-1-4612-4974-0_26
  
  # Arguments
  # x: (vector) time serie values
  # h: (int) temporal resolution of the time serie. default h=1
  
  n = length(x)-1
  
  xValues = seq(from=1, to=n, by=2)
  
  sum <- list()
  for(i in 1:length(xValues)){
    n_sub <- xValues[[i]]-1
    n <- xValues[[i]]
    n_add <- xValues[[i]]+1
    
    v1 <- x[[n_sub+1]]
    v2 <- x[[n+1]]
    v3 <- x[[n_add+1]]
    
    s <- (h/3)*(v1+4*v2+v3)
    sum <- append(sum, s)
  }
  sum <- unlist(sum)
  
  auc <- sum(sum)
  
  return(auc)
  
}

Here a data example:

smoothed = c(0.3,0.317,0.379,0.452,0.519,0.573,0.61,0.629,0.628,0.613,0.587,0.556,0.521,
             0.485,0.448,0.411,0.363,0.317,0.273,0.227,0.185,0.148,0.12,0.103,0.093,0.086,
             0.082,0.079,0.076,0.071,0.066,0.059,0.053,0.051,0.052,0.057,0.067,0.081,0.103,
             0.129,0.165,0.209,0.252,0.292,0.328,0.363,0.398,0.431,0.459,0.479,0.491,0.494,
             0.488,0.475,0.457,0.43,0.397,0.357,0.316,0.285,0.254,0.227,0.206,0.189,0.181,
             0.171,0.157,0.151,0.162,0.192,0.239)

Solution

  • One recommended way to handle an even number of points and still achieve precision is to combine Simpson's 1/3 rule with Simpson's 3/8 rule, which can handle an even number of points. Such approaches can be found in (at least one or perhaps more) engineering textbooks on numerical methods.

    However, as a practical matter, you can write a code chunk to check the data length and add a single trapezoid at the end, as was suggested in the last comment of the post to which you linked. I wouldn't assume that it is necessarily as precise as combining Simpson's 1/3 and 3/8 rules, but it is probably reasonable for many applications.

    I would double-check my code edits below, but this is the basic idea.

    AUC <- function(x, h=1){
      # AUC function computes the Area Under the Curve of a time serie using
      # the Simpson's Rule (numerical method).
      # https://link.springer.com/chapter/10.1007/978-1-4612-4974-0_26
      
      # Arguments
      # x: (vector) time serie values
      # h: (int) temporal resolution of the time serie. default h=1
      
      #jh edit: check for even data length
      #and chop off last data point if even
      nn = length(x)
      if(length(x) %% 2 == 0){
      xlast = x[length(x)]
      x = x[-length(x)]
      }
    
      n = length(x)-1
      
      xValues = seq(from=1, to=n, by=2)
      
      sum <- list()
      for(i in 1:length(xValues)){
        n_sub <- xValues[[i]]-1
        n <- xValues[[i]]
        n_add <- xValues[[i]]+1
        
        v1 <- x[[n_sub+1]]
        v2 <- x[[n+1]]
        v3 <- x[[n_add+1]]
        
        s <- (h/3)*(v1+4*v2+v3)
        sum <- append(sum, s)
      }
      sum <- unlist(sum)
      
      auc <- sum(sum)
      ##jh edit: add trapezoid for last two data points to result
      if(nn %% 2 == 0){
      auc <- auc + (x[length(x)] + xlast)/2 * h
      }
      
      
      return(auc)
      
    }
    
    
    sm = smoothed[-length(smoothed)]
    length(sm)
    [1] 70
    #even data as an example
    AUC(sm)
    [1] 20.17633
    #original odd data
    AUC(smoothed)
    [1] 20.389