I have a dataset that contains a continuous variable for which I want to display the density and a grouping variable that I want to use to split the density. When the sizes of the groups are similar, the density plot comes out fine:
library(ggplot2)
data("lalonde", package = "cobalt")
ggplot(lalonde, aes(x = educ, fill = factor(treat))) +
geom_density(alpha = .5)
Now, let's say my groups are of different sizes, but the same relative frequencies for each variable are present within each group. In the example below, I simply replicate the rows of one of the groups many times while keeping the other group as it was.
bigll <- do.call("rbind", c(list(lalonde), replicate(100,
lalonde[lalonde$treat == 0,], simplify = FALSE)))
ggplot(bigll, aes(x = educ, fill = factor(treat))) +
geom_density(alpha = .5)
It appears much less smooth. Is there a way to adjust the smoothness parameters by group to so that the second plot would appear more similar to the first plot? That is, can I change the smoothness parameters to the lowest common denominator so that the densities can be visually compared more easily?
With the help of @Carlos and others, I found what I was looking for. It's true that the smoothness of the density should typcially refelct the size of the sample as Carlos mentioned, but in my case what I wanted is for the bandwidth of the two densities to be the same; in particular, I wanted them to be that of the smaller group. The default bandwidth in ggplot2 is bw.nrd0
; I can use that on the smaller group and then set that as the global bandwidth for my plot.
bw <- bw.nrd0(bigll$educ[bigll$treat == 1])
ggplot(bigll, aes(x = educ, fill = factor(treat))) +
geom_density(alpha = .5, bw = bw)
That definitely obscures some of the detail in the larger distribution, but for my purposes this was sufficient.