Search code examples
rggplot2boxplotnormal-distribution

Plotting a vertical normal distribution next to a box plot in R


I'm trying to plot box plots with normal distribution of the underlying data next to the plots in a vertical format like this: What I'd Like to Get

This is what I currently have graphed from an excel sheet uploaded to R: Current Box Plots

And the code associated with them:

set.seed(12345)
library(ggplot2)
library(ggthemes)
library(ggbeeswarm)


#graphing boxplot and quasirandom scatterplot together
ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) + 
  geom_quasirandom(shape=20, fill="gray", color = "gray") +
  geom_boxplot(fill="NA", color = c("red4", "orchid4", "dark green", "blue"), 
               outlier.color = "NA") +
  theme_hc()

Is this possible in ggplot2 or R in general? Or is the only way this would be feasible is through something like OrignLab (where the first picture came from)?


Solution

  • There are a few ways to do this. To gain full control over the look of the plot, I would just calculate the curves and plot them. Here's some sample data that's close to your own and shares the same names, so it should be directly applicable:

    set.seed(12345)
    
    X8_17_20_R_20_60 <- data.frame(
      Diameter = rnorm(4000, rep(c(41, 40, 42, 40), each = 1000), sd = 6),
      Type = rep(c("AvgFeret", "CalcDiameter", "Feret", "MinFeret"), each = 1000))
    

    Now we create a little data frame of normal distributions based on the parameters taken from each group:

    df <- do.call(rbind, mapply( function(d, n) {
      y <- seq(min(d), max(d), length.out = 1000)
      data.frame(x = n - 5 * dnorm(y, mean(d), sd(d)) - 0.15, y = y, z = n)
      }, with(X8_17_20_R_20_60, split(Diameter, Type)), 1:4, SIMPLIFY = FALSE))
    

    Finally, we draw your plot and add a geom_path with the new data.

    library(ggplot2)
    library(ggthemes)
    library(ggbeeswarm)
    
    ggplot(X8_17_20_R_20_60, aes(Type, Diameter)) + 
      geom_quasirandom(shape = 20, fill = "gray", color = "gray") +
      geom_boxplot(fill="NA", aes(color = Type), outlier.color = "NA") +
      scale_color_manual(values = c("red4", "orchid4", "dark green", "blue")) +
      geom_path(data = df, aes(x = x, y = y, group = z), size = 1) +
      theme_hc()
    

    Created on 2020-08-21 by the reprex package (v0.3.0)