Search code examples
rggplot2overlay

Overlay normal curve to histogram in ggplot2


I want to plot an histogram for a vector called "Dist" which has a normal distribution, and overlay a Normal Curve with the parameters for the population. I found several posts in stackoverflow about the same topic but none for the error messages i´m getting.

plot1 <-ggplot(data = dist) + 
  geom_histogram(mapping = aes(x = dist), fill="steelblue", colour="black", binwidth = 1) +
  ggtitle("Frequences")

enter image description here

I´ve tried several things for adding a normal curve to the prior plot:

First, adding a function to the histogram chunk code with the required values:

stat_function(fun = dnorm, args = list(mean = mu2, sd = sd2))

But this code doesn´t add anything to the plot. The result is the same, just the histogram.

And also, creating a curve and adding it to the plot.

#Create the curve data
x <- seq(8, 24, length.out=100)
y <- with(dist, data.frame(x = x, y = dnorm(x, mean(mu2), sd(sd2))))

#add the curve to the base plot
plot1 + geom_line(data = y, aes(x = x, y = y), color = "red")

This gives me the next error message:

Removed 100 row(s) containing missing values (geom_path).

But I actually don´t find any removed or null values in the vector, so I´m not sure about how to solve this.

I´m also able to do this without ggplot2 in a very simple way, although I´m interested in doing it in ggplot2:

hist(dist$dist, freq =FALSE, main="histogram")
curve(dnorm(x, mean = mu2, sd = sd2), from = 8, to = 24, add = TRUE)

Solution

  • I suspect that stat_function does indeed add the density of the normal distribution. But the y-axis range just let's it disappear all the way at the bottom of the plot. If you scale your histogram to a density with aes(x = dist, y=..density..) instead of absolute counts, your curve from dnorm should become visible.

    (As a side note, your distribution does not look normal to me. You might want to check, e.g. with a qqplot)

    library(ggplot2)
    
    dist = data.frame(dist = rnorm(100))
    
    plot1 <-ggplot(data = dist) + 
      geom_histogram(mapping = aes(x = dist, y=..density..), fill="steelblue", colour="black", binwidth = 1) +
      ggtitle("Frequences") +
      stat_function(fun = dnorm, args = list(mean = mean(dist$dist), sd = sd(dist$dist)))
    

    enter image description here