Search code examples
rggplot2histogram

Illustrate standard deviation in histogram


Consider the following simple example:

# E. Musk in Grunheide 
set.seed(22032022) 

# generate random numbers 
randomNumbers <- rnorm(n = 1000, mean = 10, sd = 10)

# empirical sd 
sd(randomNumbers)
#> [1] 10.34369

# histogram 
hist(randomNumbers, probability = TRUE, main = "", breaks = 50)

# just for illusatration purpose 
###
# empirical density 
lines(density(randomNumbers), col = 'black', lwd = 2)
# theortical density 
curve(dnorm(x, mean = 10, sd = 10), col = "blue", lwd = 2, add = TRUE)
###

Created on 2022-03-22 by the reprex package (v2.0.1)

Question: Is there a nice way to illustrate the empirical standard deviation (sd) in the histogram by colour? E.g. representing the inner bars by a different color, or indicating the range of the sd by an interval, i.e., [mean +/- sd], on the x-axis?

Note, if ggplot2 provides an easy solution, suggesting this would be also much appreciated.


Solution

  • This is similar ggplot solution to Benson's answer, except we precompute the histogram and use geom_col, so that we don't get any of the unwelcome stacking at the sd boundary:

    # E. Musk in Grunheide 
    set.seed(22032022) 
    
    # generate random numbers 
    randomNumbers <- rnorm(n=1000, mean=10, sd=10)
    
    h <- hist(randomNumbers, breaks = 50, plot = FALSE)
    
    lower <- mean(randomNumbers) - sd(randomNumbers)
    upper <- mean(randomNumbers) + sd(randomNumbers)
    
    df <- data.frame(x = h$mids, y = h$density, 
                     fill = h$mids > lower & h$mids < upper)
    
    library(ggplot2)
    
    ggplot(df) +
      geom_col(aes(x, y, fill = fill), width = 1, color = 'black') +
      geom_density(data = data.frame(x = randomNumbers), 
                   aes(x = x, color = 'Actual density'),
                   key_glyph = 'path') +
      geom_function(fun = function(x) {
        dnorm(x, mean = mean(randomNumbers), sd = sd(randomNumbers)) },
        aes(color = 'theoretical density')) +
      scale_fill_manual(values = c(`TRUE` = '#FF374A', 'FALSE' = 'gray'), 
                        name = 'within 1 SD') +
      scale_color_manual(values = c('black', 'blue'), name = 'Density lines') +
      labs(x = 'Value of random number', y = 'Density') +
      theme_minimal()
    

    enter image description here