I have a data frame in R called x
that has hundreds of rows. Each row is a person. I have two variables, Height
, which is continuous, and Country
, which is a factor. I want to plot a smoothed histogram of all of the heights of the individuals. I want to stratify it by Country
. I know that I can do that with the following code:
library(ggplot2)
ggplot(x, aes(x=Height, colour = (Country == "USA"))) + geom_density()
This plots everyone from the USA as one color (true) and everyone from any other country as the other color (false). However, what I would really like to do is plot everyone from the USA in one color and everyone from Oman, Nigeria, and Switzerland as the other color. How would I adapt my code to do this?
I made up some data for illustration:
head(iris)
table(iris$Species)
df <- iris
df$Species2 <- ifelse(df$Species == "setosa", "blue",
ifelse(df$Species == "virginica", "red", ""))
library(ggplot2)
p <- ggplot(df, aes(x = Sepal.Length, colour = (Species == "setosa")))
p + geom_density() # Your example
# Now let's choose the other created column
p <- ggplot(df, aes(x = Sepal.Length, colour = Species2))
p + geom_density() + facet_wrap(~Species2)
Edit to get rid of the "countries" that you don't want in the plot, just subset them out of the data frame you use in the plot (note that the labels with the colours don't exactly match but that can be changed within the data frame itself):
p <- ggplot(df[df$Species2 %in% c("blue", "red"),], aes(x = Sepal.Length, colour = Species2))
p + geom_density() + facet_wrap(~Species2)
And to overlay the lines just take out the
facet_wrap
:
p + geom_density()