Search code examples
rggplot2histogramcategoriesdensity-plot

Stratifying a density plot by different groups using ggplot2 in R


I have a data frame in R called x that has hundreds of rows. Each row is a person. I have two variables, Height, which is continuous, and Country, which is a factor. I want to plot a smoothed histogram of all of the heights of the individuals. I want to stratify it by Country. I know that I can do that with the following code:

library(ggplot2)
ggplot(x, aes(x=Height, colour = (Country == "USA"))) + geom_density()

This plots everyone from the USA as one color (true) and everyone from any other country as the other color (false). However, what I would really like to do is plot everyone from the USA in one color and everyone from Oman, Nigeria, and Switzerland as the other color. How would I adapt my code to do this?


Solution

  • I made up some data for illustration:

    head(iris)
    table(iris$Species)
    df <- iris
    df$Species2 <- ifelse(df$Species == "setosa", "blue", 
                   ifelse(df$Species == "virginica", "red", ""))
    
    library(ggplot2)
    p <- ggplot(df, aes(x = Sepal.Length, colour = (Species == "setosa")))
    p + geom_density() # Your example
    

    example with true and false

    # Now let's choose the other created column
    p <- ggplot(df, aes(x = Sepal.Length, colour = Species2))
    p + geom_density() + facet_wrap(~Species2)
    

    example with extra column Edit to get rid of the "countries" that you don't want in the plot, just subset them out of the data frame you use in the plot (note that the labels with the colours don't exactly match but that can be changed within the data frame itself):

    p <- ggplot(df[df$Species2 %in% c("blue", "red"),], aes(x = Sepal.Length, colour = Species2))
    p + geom_density() + facet_wrap(~Species2)
    

    example with filtered data frame And to overlay the lines just take out the facet_wrap:

    p + geom_density() 
    

    example without facet_wrap