How can I group a density plot and have the density of each group sum to one, when using weighted data?
The ggplot2
help for geom_density()
suggests a hack for using weighted data: dividing by the sum of the weights. But when grouped, this means that the combined density of the groups totals one. I would like the density of each group to total one.
I have found two clumsy ways to do this. The first is to treat each group as a separate dataset:
library(ggplot2)
library(ggplot2movies) # load the movies dataset
m <- ggplot()
m + geom_density(data = movies[movies$Action == 0, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="black") +
geom_density(data = movies[movies$Action == 1, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="blue")
Obvious disadvantages are the manual handling of factor levels and aesthetics. I also tried using the windowing functionality of the data.table
package to create a new column for the total votes per Action group, dividing by that instead:
movies.dt <- data.table(movies)
setkey(movies.dt, Action)
movies.dt[, votes.per.group := sum(votes), Action]
m <- ggplot(movies.dt, aes(x=rating, weight=votes/votes.per.group, group = Action, colour = Action))
m + geom_density(fill=NA)
Are there neater ways to do this? Because of the size of my tables, I'd rather not replicate rows by their weighting for the sake of using frequency.
Using dplyr
library(dplyr)
library(ggplot2)
library(ggplot2movies)
movies %>%
group_by(Action) %>%
mutate(votes.grp = sum(votes)) %>%
ggplot(aes(x=rating, weight=votes/votes.grp, group = Action, colour = Action)) +
geom_density()