As an example, here a stacked density chart based on diamonds
. A simple table suggests that "Fair" cut should be quite uncommon, and "Ideal" the most common, etc..., but the five cuts seem to occupy a more or less equal area/ proportion on the stacked charts. That is surprising to me.
This is probably a conceptual misunderstanding from my side and I'd be grateful for someone to help. So, what does stacked density exactly show?
library(tidyverse)
library(patchwork)
p1 <- diamonds %>%
ggplot() +
geom_density(aes(x = price, fill = cut), position = "stack")
p2 <- diamonds %>%
ggplot() +
geom_density(aes(x = price, fill = cut), position = "fill")
p1 / p2 + plot_layout(guides = "collect")
table(diamonds$cut)
#>
#> Fair Good Very Good Premium Ideal
#> 1610 4906 12082 13791 21551
Created on 2023-11-12 with reprex v2.0.2
The densities are being normalized (i.e. scaled to area 1) before being combined. If you don't want that, use after_stat(count)
, e.g.
library(tidyverse)
diamonds %>%
ggplot() +
geom_density(aes(x = price, y = after_stat(count),
fill = cut),
position = "stack")
Created on 2023-11-12 with reprex v2.0.2
Edited to add:
Here's how this works: geom_density()
always computes the normalized density for each group. Setting y = after_stat(count)
tells it to multiply the density by the count for the group, so you get "number of cases per unit of price
" instead of "probability per unit of price
", the usual units of the density.
If you'd like to keep the density scale, you can use
diamonds %>%
ggplot() +
geom_density(aes(x = price,
y = after_stat(count/sum(tapply(n, group, unique))),
fill = cut),
position = "stack") +
ylab("Density")
This will actually show "subdensities", where the total area is 1, and the area of each region is the proportion of cases of that class. (The expression sum(tapply(n, group, unique))
is a fancy way to compute nrow(diamonds)
without referring to the particular dataset. It says: find the unique values of n
in each group, and add them up. Since n
is the same for all entries in the group, that gives the overall total. Thanks to @Stefan for the trick.)