Search code examples
rggplot2histogramkernel-densitydensity-plot

How to avoid a flat density line in ggplot2


I'm trying to plot a density line over 2 overlapped histograms, but with every code I use, the line gets "flat".

I have to create two histograms, each with a normal distribution, and a different number of samples. Then I have to overlap both and write the density line. All with ggplot2 package.

This is what I've tried:

xx<-data.frame(dat = rnorm(n, mean, sd))
yy<-data.frame(dat = rnorm(n, mean, sd))
both<-rbind(xx, yy)

ggplot(both, aes(x=dat)) + 
    geom_histogram(data = xx, fill = "red", alpha = 0.2,binwidth=0.25) + 
    geom_histogram(data = yy, fill = "blue", alpha = 0.2, binwidth=0.25) +
    theme_light() +
    geom_line(data=samples, stat = "density")

I also tried geom_density but the result is the same...


Solution

  • The density line is not flat, it's simply on a very different scale with respect to the histograms, since, by default, the histogram is created using counts on the y-axis.

    You should specify y = after_stat(density):

    # packages
    library(ggplot2)
    
    # data
    set.seed(1)
    sample1 <- data.frame(dat = rnorm(10000, 0, 1))
    sample2 <- data.frame(dat = rnorm(15000, 3, 1))
    both <- rbind(sample1, sample2)
    
    ggplot(both, aes(x = dat)) + 
      geom_histogram(aes(y = after_stat(density)), data = sample1, fill = "red", alpha = 0.2, binwidth = 0.25) + 
      geom_histogram(aes(y = after_stat(density)), data = sample2, fill = "blue", alpha = 0.2, binwidth=0.25) +
      theme_light() +
      geom_line(stat = "density")
    

    Created on 2020-04-30 by the reprex package (v0.3.0)

    The black line represents a sort of a mixture of the two normal distributions. You should read the help page of the after_stat function for more details.