Search code examples
rggplot2binninggeom-histogram

Normalize geom_density for x-values of different magnitudes


I have a data set of two different measurements which have very different magnitudes in their value (let's call them height and width) measured in different samples (a and b). I want to illustrate the variability of these two measurements regardless of their absolute magnitude using a smoothed histogram, so I 've been using geom_density. However, the density values calculated using geom_density are likewise orders of magnitude different from one another.

The dataset:

library(tidyverse)
set.seed(123)
sample_a = tibble(sample = "a",
                  height = rnorm(20, mean = 0.1, sd = 0.01),
                  width = rnorm(20, mean = 50, sd = 10)) %>% 
  pivot_longer(c(height, width), names_to = "parameter", values_to = "result")

set.seed(321)
sample_b = tibble(sample = "b",
                  height = rnorm(20, mean = 0.2, sd = 0.03),
                  width = rnorm(20, mean = 55, sd = 10)) %>% 
  pivot_longer(c(height, width), names_to = "parameter", values_to = "result")

data = bind_rows(sample_a, sample_b)

When I plot the histograms, I am able to compare the magnitude and variability for each parameter of interest in each sample when sample count is plotted on the y axis:

data %>% 
  ggplot()+
  geom_histogram(aes(x = result))+
  facet_grid(sample~parameter, scales = "free_x")

enter image description here

However, when I use geom_density (even incorporating y = after_stat(count), as suggested in this answer: Normalizing y-axis in histograms in R ggplot to proportion), the magnitudes are substantially different:

data %>% 
  ggplot()+
  geom_density(aes(x = result, y = stat(count)))+
  facet_grid(sample~parameter, scales = "free_x")

enter image description here

How would I show these different magnitudes of results in a faceted plot using smoothed histograms?


Solution

  • We can use after_stat(scaled):

    ggplot(data, aes(x = result)) + 
      geom_density(aes(y = after_stat(scaled))) +
      facet_grid(sample~parameter, scales = "free_x")
    

    To illustrate better, I showed the histogram and density plots on the same graph:

    ggplot(data, aes(x = result)) + 
      geom_histogram(aes(y = after_stat(count)), colour = "black", fill = NA) +
      geom_density(aes(y = after_stat(scaled))) +
      facet_grid(sample~parameter, scales = "free_x")
    

    Or even better, use after_stat(ncount) to normalize the counts and match the histogram with the density plot:

    ggplot(data, aes(x = result)) + 
      geom_histogram(aes(y = after_stat(ncount)), colour = "black", fill = NA) +
      geom_density(aes(y = after_stat(scaled))) +
      facet_grid(sample~parameter, scales = "free_x")
    

    Created on 2023-11-01 with reprex v2.0.2