Search code examples
rplothistogramkernel-densitydensity-plot

Legend in R plot that has kernel density, normal density and a histogram?


I am currently a beginner in R and had a question on how to insert a legend for three plots I am working on. I am working with the built-in dataset on R called iris. I have something down for what I think should work in order for the legend to appear, but that is not the case, as only the plots appear. I have attached images of the plots below. Can someone please tell me what I need to do in order for the legend to appear on the respective plot? Thank you in advance.

setosa_length <- iris$Sepal.Length[iris$Species == "setosa"]
hist(setosa_length, freq=FALSE)
x <- seq(4, 8, length.out=100)
y <- with(iris, dnorm(x, mean(setosa_length), sd(setosa_length)))
lines(x, y, col="red")
lines(density(setosa_length), col="blue")
legend(1, 95, legend=c("Normal Density", "Kernel Density"), col=c("red", 
"blue"), lty=1:2, cex=0.5)

versicolor_length <- iris$Sepal.Length[iris$Species == "versicolor"]
hist(versicolor_length, freq=FALSE)
x <- seq(4, 8, length.out=100)
y <- with(iris, dnorm(x, mean(versicolor_length), sd(versicolor_length)))
lines(x, y, col="red")
lines(density(versicolor_length), col="blue")
legend(1, 95, legend=c("Normal Density", "Kernel Density"), col=c("red", 
"blue"), lty=1:2, cex=0.5)

virginica_length <- iris$Sepal.Length[iris$Species == "virginica"]
hist(virginica_length, freq=FALSE)
x <- seq(4, 8, length.out=100)
y <- with(iris, dnorm(x, mean(virginica_length), sd(virginica_length)))
lines(x, y, col="red")
lines(density(virginica_length), col="blue")
legend(1, 95, legend=c("Normal Density", "Kernel Density"), col=c("red", 
"blue"), lty=1:2, cex=0.5)

Solution

  • I would highly recommend to learn some tidyverse, as it makes most of these problems go away and also leads to much more readable code.

    library(tidyverse)
    
    # calculate the normal densities for the three species
    x <- seq(4, 8, length.out=100)
    iris.norm <- group_by(iris, Species) %>%
      summarize(mean = mean(Sepal.Length),
                sd = sd(Sepal.Length)) %>%
      mutate(data = map2(mean, sd, ~ data.frame(Sepal.Length = x,
                                                density = dnorm(x, .x, .y)))) %>%
      unnest()
    
    # plot histograms and densities on top of each other
    ggplot(iris) + 
      geom_histogram(aes(x = Sepal.Length, y = ..density..),
                     color = "black", fill = "white", bins = 8) +
      geom_line(aes(x = Sepal.Length, color = "Kernel Density"),
                stat = "density") +
      geom_line(data = iris.norm,
                aes(x = Sepal.Length, y = density, color = "Normal Density")) +
      facet_wrap(~Species, ncol = 1) +
      theme_minimal()
    

    enter image description here