Search code examples
rggplot2density-plot

Identify spikes/peaks in density plot by group


Image

I created a density plot with ggplot2 package for R. I would like to identify the spikes/peaks in the plot which occur between 0.01 and 0.02. There are too many legends to pick it out so I deleted all legends. I tried to filter my data out to find most number of rows that a group has between 0.01 and 0.02. Then I filtered out the selected group to see whether the spike/peak is gone but no, it is there plotted still. Can you suggest a way to identify these spikes/peaks in these plots?

Here is some code :

ggplot(NumofHitsnormalized, aes(NumofHits_norm, fill = name)) + geom_density(alpha=0.2) + theme(legend.position="none") + xlim(0.0 , 0.15) 

## To filter out the data that is in the range of first spike
test <- NumofHitsnormalized[which(NumofHitsnormalized$NumofHits_norm > 0.01 & NumofHitsnormalized$NumofHits_norm <0.02),] 

## To figure it out which group (name column) has the most number of rows ##thus I thought maybe I could get the data that lead to spike
testMatrix <- matrix(ncol=2, nrow= length(unique(test$name))) 
for (i in 1:length(unique(test$name))){ 
testMatrix[i,1] <- unique(test$name)[i] 
testMatrix[i,2] <- nrow(unique(test$name)[i])} 

extremeValues

Konrad,

This is the new plot made after I filtered my data out with extremevalues package. There are new peaks and they are located at different intervals and it also says 96% of the initial groups have data in the new plot (though number of rows in filtered data reduced to 0.023% percent of the initial dataset) so I cant identify which peaks belong to which groups.


Solution

  • I had a similar problem to this.

    How i did was to create a rolling mean and sd of the y values with a 3 window.

    Calculate the average sd of your baseline data ( the data you know won't have peaks)

    Set a threshold value

    If above threshold, 1, else 0.

    d5$roll_mean = runMean(d5$`Current (pA)`,n=3)
    d5$roll_sd = runSD(x = d5$`Current (pA)`,n = 3)
    d5$delta = ifelse(d5$roll_sd>1,1,0)
    currents = subset(d5,d5$delta==1,na.rm=TRUE) # Finds all peaks
    

    my threshold was a sd > 1. depending on your data you may want to use mean or sd. for slow rising peaks mean would be a better idea than sd.