Search code examples
rzooauc

R - maximize area under curve for multiple scenarios


Considering that I have two vectors, one called residues and a second one called scores, which have 31 scores, one for each residue, all positive numbers. To illustrate, the two vectors were obtained as shown below:

residues <- 1:31
scores <- runif(n = 31, min = 0.35, max = 3.54)

I am considering a random sequence just to exemplify. If I plot the residues x the scores I will have the following graphic:

enter image description here

What I want to do is the following: I will consider specific combinations of 15 residues (henceforth referred as 15mer), skipping one residue (i.e. 1:15, 2:16, 3:17 all the way up to 17:31) and I want to calculate the area under the curve (AUC) for all these 17 combinations. My final goal is to select the 15mer that has the highest AUC.

The AUC can be calculated using the rollmean function from the zoo package, as shown in this question. However, as I have, in this example, 17 possible combinations, I am trying to find a script to automatize the process. Thanks in advance.


Solution

  • library(zoo)
    
    set.seed(555)
    residues <- 1:31
    scores <- runif(n = 31, min = 0.35, max = 3.54)
    
    
    which.max(sapply(1:17, function(x){sum(diff(residues[x:(x+14)])*rollmean(scores[x:(x+14)],2))}))
    # result 7 i.e. 7:21
    

    or

    sapply(1:17, function(x){sum(diff(residues[x:(x+14)])*rollmean(scores[x:(x+14)],2))}) # gives you the AUCs
    # result [1] 28.52530 29.10203 28.52847 27.65325 27.19925 28.77782 29.29373 28.13133 28.23705 27.68724 25.75294 25.27226 25.44963 25.81201 25.49907 23.48632
            #[17] 22.45763
    

    or with a custom function

    f_AUC <- function(x, y, lngth){
      sapply(1:(length(x)-lngth+1), function(z) sum(diff(x[z:(z+lngth-1)])*rollmean(y[z:(z+lngth-1)],2)))
    }
    
    f_AUC(x=residues, y=scores, lngth=15)