Considering that I have two vectors, one called residues
and a second one called scores
, which have 31 scores, one for each residue, all positive numbers. To illustrate, the two vectors were obtained as shown below:
residues <- 1:31
scores <- runif(n = 31, min = 0.35, max = 3.54)
I am considering a random sequence just to exemplify.
If I plot the residues
x the scores
I will have the following graphic:
What I want to do is the following: I will consider specific combinations of 15 residues (henceforth referred as 15mer), skipping one residue (i.e. 1:15, 2:16, 3:17 all the way up to 17:31) and I want to calculate the area under the curve (AUC) for all these 17 combinations. My final goal is to select the 15mer that has the highest AUC.
The AUC can be calculated using the rollmean function from the zoo package, as shown in this question. However, as I have, in this example, 17 possible combinations, I am trying to find a script to automatize the process. Thanks in advance.
library(zoo)
set.seed(555)
residues <- 1:31
scores <- runif(n = 31, min = 0.35, max = 3.54)
which.max(sapply(1:17, function(x){sum(diff(residues[x:(x+14)])*rollmean(scores[x:(x+14)],2))}))
# result 7 i.e. 7:21
or
sapply(1:17, function(x){sum(diff(residues[x:(x+14)])*rollmean(scores[x:(x+14)],2))}) # gives you the AUCs
# result [1] 28.52530 29.10203 28.52847 27.65325 27.19925 28.77782 29.29373 28.13133 28.23705 27.68724 25.75294 25.27226 25.44963 25.81201 25.49907 23.48632
#[17] 22.45763
or with a custom function
f_AUC <- function(x, y, lngth){
sapply(1:(length(x)-lngth+1), function(z) sum(diff(x[z:(z+lngth-1)])*rollmean(y[z:(z+lngth-1)],2)))
}
f_AUC(x=residues, y=scores, lngth=15)