Search code examples
ggplot2overlapdensity-plot

ggplot multiple densities with common density


I would like to plot something that is "between" a histogram and a density plot. Here is an example:

library(ggplot2)

set.seed(1)
f1 <- rep(1, 100)
v1 <- rnorm(100)
df1 <- data.frame(f1, v1)

f1 <- rep(2, 10)
v1 <- (rnorm(10)+1*2)
df2 <- data.frame(f1, v1)

df <- rbind(df1, df2)
df$f1 <- as.factor(df$f1)

ggplot(df, aes(x = v1, colour = f1)) +
  geom_density(position="identity", alpha = 0.6, fill = NA, size = 1)

You will see that the area under each curve is 1.0, which is OK for a density. BUT notice that the second distribution is made up of just 10 observations rather than the 100 of the first. What I would like is that the area under curve 2 reflects this, e.g. is a tenth of that of curve 1. Thanks.

Two overlapping density plots estimated form differnt sample sizes


Solution

  • There is a computed variable for stat_density that you can use, called count.

    ggplot(df, aes(x = v1, colour = f1)) +
      geom_density(position="identity", alpha = 0.6, fill = NA, size = 1,
                   aes(y = after_stat(count)))
    

    enter image description here

    • Note for ggplot2 <3.3.0 use stat(count) instead of after_stat(count).

    You can find these tricks in the documentation of ?geom_density() under the section "Computed Variables".