Search code examples
rggplot2statisticsauc

Is there a way to calculate the number of peaks above a threshold for multiple dependent variables in R?


I apologize if this question has been asked already. I'm a beginner to R and do not have an advanced stats background. I am trying to determine the number of peaks (maximums) for my data in R. For those familiar with GraphPad Prism, essentially I am trying to find the "Number of Peaks" by doing an Area Under the Curve analysis and using a threshold of y=2. My dataframe is below (called example1).

time a  b   c   d   e   f
0   248 413 473 370 501 235
3   243 408 468 356 496 237
6   243 406 476 354 503 235
9   243 412 469 352 500 232
12  245 411 467 355 504 235
15  241 420 483 348 499 227
18  239 416 477 344 493 230
21  236 413 478 338 490 227
24  234 406 475 337 481 229

My x variable would be the first column and my y variable would be the rest of the columns (very large dataset- 50+ dependent variables). For each dependent variable or column, I am trying to find the number of peaks (local maxima). I need to make my y threshold = 2.

I have already plotted my data (code below) using ggplot by melting my dataframe.

#Melt data
melted <- melt(example1, id.vars="time")

#Create line graph
ggplot(data=melted, aes(x=time, y=value, group=variable)) + 
  geom_line(show.legend = TRUE))

How can I calculate and output the number of peaks per column (with the y=2 threshold)? Will I need to do an area under the curve analysis? I'm hoping to calculate something like this (number of peaks per column)... a = 0, b = 3, c = 0, d = 6, e = 1, f = 0 but the output could be something like 0, 3, 0, 6, 1, 0

I do not need to produce another graph. I just need an output of number of peaks per dependent variable.


Solution

  • There is a findpeaks() function available through the pracma package that is exceptionally useful for this type of thing. See documentation here. You can specify the threshold or go with default settings. There are also some parameters to help ignore or include peaks that span multiple points.

    You feed findpeaks() the time-series vector (meaning make sure that it is ordered by your x axis first), and it will output a matrix where the number of rows corresponds to the number of peaks, and for each peak you get maxima (y value), index, beginning index, and end index. See the utilization below with your example1 dataset:

    peak_info <- lapply(example1[,2:7], findpeaks, threshold=2)
    
    > peak_info
    $a
         [,1] [,2] [,3] [,4]
    [1,]  245    5    4    9
    
    $b
         [,1] [,2] [,3] [,4]
    [1,]  420    6    5    9
    
    $c
         [,1] [,2] [,3] [,4]
    [1,]  476    3    2    5
    [2,]  483    6    5    7
    
    $d
         [,1] [,2] [,3] [,4]
    [1,]  355    5    4    9
    
    $e
         [,1] [,2] [,3] [,4]
    [1,]  503    3    2    4
    [2,]  504    5    4    9
    
    $f
         [,1] [,2] [,3] [,4]
    [1,]  237    2    1    4
    [2,]  235    5    4    6
    [3,]  230    7    6    8
    

    If you just want to know the number of peaks, you can run the following:

    > unlist(lapply(peak_info, nrow))
    
    a b c d e f 
    1 1 2 1 2 3