Search code examples
rggplot2kernel-densitydensity-plot

Correcting and unifying peaks in overlay density plot from rstudio


I am having trouble with comparing multiple densities and in need for help from R masters.

I am comparing timestamp data collected from 7 different time periods (30, 45, 60, 90, 120, 180 and 240mins) in terms of density. I plotted an overlay KDE graph using ggplot2.

enter image description here

What I am trying to do in the next step is to match the y-coordinate of each peak to 1 and adjust the rest of the density points accordingly. So basically I will multiply each KDE function to a constant that makes each peak equal to 1 (and then visualize the adjusted data by plotting the 'matched peak KDE graph').

How can I do this in Rstudio?


Solution

  • This is kind of hackish, and there may be a cleaner way of doing it. I'm going to use the iris dataset here.

    library(ggplot2)
    library(dplyr)
    

    First, building the typical density plot:

    p <- ggplot(iris, aes(x = Sepal.Length, colour = Species)) + 
      geom_density()
    p
    

    enter image description here

    The ggplot_build() function allows you to access the plot's internal information.

    p_build <- ggplot_build(p)
    

    Within that that list, there's a data object, that houses the mapped coordinates resulting from the geom_density() call. I'll grab that.

    p_mod <- p_build$data[[1]]
    

    Then I make the adjustment. First I need to reestablish what groups the colours refer to, and then I re-set for each colour the y value as desired:

    p_modded <- p_mod %>%
      mutate(Species = case_when(colour == "#F8766D" ~ "setosa",
                                 colour == "#00BA38" ~ "versicolor",
                                 TRUE ~ "virginica")) %>% 
      group_by(colour) %>% 
      mutate(y = y / max(y)) %>% 
      ungroup()
    

    And now a new chart. Note that I don't need to use geom_density() because the density has already been calculated, so I just need to use geom_line() instead.

        ggplot(p_modded, aes(x = x, y = y, colour = Species)) + 
      geom_line()
    

    enter image description here