I am having trouble with comparing multiple densities and in need for help from R masters.
I am comparing timestamp data collected from 7 different time periods (30, 45, 60, 90, 120, 180 and 240mins) in terms of density. I plotted an overlay KDE graph using ggplot2.
What I am trying to do in the next step is to match the y-coordinate of each peak to 1 and adjust the rest of the density points accordingly. So basically I will multiply each KDE function to a constant that makes each peak equal to 1 (and then visualize the adjusted data by plotting the 'matched peak KDE graph').
How can I do this in Rstudio?
This is kind of hackish, and there may be a cleaner way of doing it. I'm going to use the iris dataset here.
library(ggplot2)
library(dplyr)
First, building the typical density plot:
p <- ggplot(iris, aes(x = Sepal.Length, colour = Species)) +
geom_density()
p
The ggplot_build()
function allows you to access the plot's internal information.
p_build <- ggplot_build(p)
Within that that list, there's a data
object, that houses the mapped coordinates resulting from the geom_density()
call. I'll grab that.
p_mod <- p_build$data[[1]]
Then I make the adjustment. First I need to reestablish what groups the colours refer to, and then I re-set for each colour the y value as desired:
p_modded <- p_mod %>%
mutate(Species = case_when(colour == "#F8766D" ~ "setosa",
colour == "#00BA38" ~ "versicolor",
TRUE ~ "virginica")) %>%
group_by(colour) %>%
mutate(y = y / max(y)) %>%
ungroup()
And now a new chart. Note that I don't need to use geom_density()
because the density has already been calculated, so I just need to use geom_line()
instead.
ggplot(p_modded, aes(x = x, y = y, colour = Species)) +
geom_line()