Search code examples
rggplot2histogramfilldensity-plot

R GGplot histograms and density with multiple fill variables


I want to create a plot with three overlapping histograms (each with a different color and semi-transparent) with their density polygons overlaid on the same graph without having the histograms automatically "stack" bins corresponding to different factors. I want the histograms to "wrap", meaning that the density of each factor is calculated independently but I want them all in the same plot so we can clearly compare the distribution (hence no facet_wrap!). I can get this to work by writing three separate lines calling the histograms individually but when I have many factor levels this becomes a pain. I also tried grouping the data by the factor but the resulting graph is still stacked. The reason I want to do this, is I want to examine many individuals and their distributions and try to identify weird individuals or subgroups that may have a different pattern so I do really want all of the graphs stacked on top of each other.

library(ggplot2)

# Simulate some data
color = c("yellow", "brown", "green")
n_per_factor = c(45, 50, 48)
flies = c(rgamma(n_per_factor[1], shape = 40, scale = 3.5),
          rgamma(n_per_factor[2], shape = 40, scale = 3),
          rgamma(n_per_factor[3], shape = 40, scale = 2.5))
bananas = data.frame(color = rep.int(color, times = n_per_factor),
                     flies = flies)

# Create histogram (is stacked don't like)
ggplot(data = bananas, aes(x = flies, after_stat(density), 
                           fill = color))+
  geom_histogram(alpha = 0.50)+
  geom_density(alpha = 0.50)+
  xlab("Flies per Banana")+
  ylab("Density")

# Create histogram (not stacked, want but don't want so many lines)
ggplot(data = bananas, aes(x = flies, after_stat(density), 
                           fill = color))+
  geom_histogram(data = bananas %>% filter(color == "yellow"),
                 aes(x = flies, after_stat(density), 
                     fill = color), alpha = 0.50)+
  geom_histogram(data = bananas %>% filter(color == "green"),
                 aes(x = flies, after_stat(density), 
                     fill = color), alpha = 0.50)+
  geom_histogram(data = bananas %>% filter(color == "brown"),
                 aes(x = flies, after_stat(density), 
                     fill = color), alpha = 0.50)+
  geom_density(alpha = 0.50)+
  xlab("Flies per Banana")+
  ylab("Density")

# Tried but failed
ggplot(data = bananas, aes(x = flies, after_stat(density), 
                           group = color, fill = color))+
  geom_histogram(alpha = 0.50)+
  geom_density(alpha = 0.50)+
  xlab("Flies per Banana")+
  ylab("Density")

Unhappy_stacked_histo

Happy_stacked_histo


Solution

  • We can use position = "identity" in geom_histogram() and geom_density() The histograms and densities will then be overlaid on top of each other.

    library(ggplot2)
    
    ggplot(data = bananas, aes(x = flies, fill = color, group = color)) +
      geom_histogram(aes(y = after_stat(density)), alpha = 0.5, position = "identity", bins = 30) +
      geom_density(alpha = 0.5, position = "identity") +
      scale_fill_manual(values = c("brown", "green", "yellow"))+
      #scale_fill_identity(guide = 'legend') + #instead of scale_fill_manual...
      xlab("Flies per Banana") +
      ylab("Density")
    

    enter image description here