I apologize if this question has been asked already. I'm a beginner to R and do not have an advanced stats background. I am trying to determine the number of peaks (maximums) for my data in R. For those familiar with GraphPad Prism, essentially I am trying to find the "Number of Peaks" by doing an Area Under the Curve analysis and using a threshold of y=2. My dataframe is below (called example1).
time a b c d e f
0 248 413 473 370 501 235
3 243 408 468 356 496 237
6 243 406 476 354 503 235
9 243 412 469 352 500 232
12 245 411 467 355 504 235
15 241 420 483 348 499 227
18 239 416 477 344 493 230
21 236 413 478 338 490 227
24 234 406 475 337 481 229
My x variable would be the first column and my y variable would be the rest of the columns (very large dataset- 50+ dependent variables). For each dependent variable or column, I am trying to find the number of peaks (local maxima). I need to make my y threshold = 2.
I have already plotted my data (code below) using ggplot by melting my dataframe.
#Melt data
melted <- melt(example1, id.vars="time")
#Create line graph
ggplot(data=melted, aes(x=time, y=value, group=variable)) +
geom_line(show.legend = TRUE))
How can I calculate and output the number of peaks per column (with the y=2 threshold)? Will I need to do an area under the curve analysis? I'm hoping to calculate something like this (number of peaks per column)... a = 0, b = 3, c = 0, d = 6, e = 1, f = 0 but the output could be something like 0, 3, 0, 6, 1, 0
I do not need to produce another graph. I just need an output of number of peaks per dependent variable.
There is a findpeaks()
function available through the pracma
package that is exceptionally useful for this type of thing. See documentation here. You can specify the threshold or go with default settings. There are also some parameters to help ignore or include peaks that span multiple points.
You feed findpeaks()
the time-series vector (meaning make sure that it is ordered by your x axis first), and it will output a matrix where the number of rows corresponds to the number of peaks, and for each peak you get maxima (y value), index, beginning index, and end index. See the utilization below with your example1
dataset:
peak_info <- lapply(example1[,2:7], findpeaks, threshold=2)
> peak_info
$a
[,1] [,2] [,3] [,4]
[1,] 245 5 4 9
$b
[,1] [,2] [,3] [,4]
[1,] 420 6 5 9
$c
[,1] [,2] [,3] [,4]
[1,] 476 3 2 5
[2,] 483 6 5 7
$d
[,1] [,2] [,3] [,4]
[1,] 355 5 4 9
$e
[,1] [,2] [,3] [,4]
[1,] 503 3 2 4
[2,] 504 5 4 9
$f
[,1] [,2] [,3] [,4]
[1,] 237 2 1 4
[2,] 235 5 4 6
[3,] 230 7 6 8
If you just want to know the number of peaks, you can run the following:
> unlist(lapply(peak_info, nrow))
a b c d e f
1 1 2 1 2 3