Search code examples
rggplot2density-plot

Density of each group of weighted geom_density sum to one


How can I group a density plot and have the density of each group sum to one, when using weighted data?

The ggplot2 help for geom_density() suggests a hack for using weighted data: dividing by the sum of the weights. But when grouped, this means that the combined density of the groups totals one. I would like the density of each group to total one.

I have found two clumsy ways to do this. The first is to treat each group as a separate dataset:

library(ggplot2)
library(ggplot2movies) # load the movies dataset

m <- ggplot()
m + geom_density(data = movies[movies$Action == 0, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="black") +
    geom_density(data = movies[movies$Action == 1, ], aes(rating, weight = votes/sum(votes)), fill=NA, colour="blue")

Obvious disadvantages are the manual handling of factor levels and aesthetics. I also tried using the windowing functionality of the data.table package to create a new column for the total votes per Action group, dividing by that instead:

movies.dt <- data.table(movies)
setkey(movies.dt, Action)
movies.dt[, votes.per.group := sum(votes), Action]
m <- ggplot(movies.dt, aes(x=rating, weight=votes/votes.per.group, group = Action, colour = Action))
m + geom_density(fill=NA)

Are there neater ways to do this? Because of the size of my tables, I'd rather not replicate rows by their weighting for the sake of using frequency.


Solution

  • Using dplyr

    library(dplyr)
    library(ggplot2)
    library(ggplot2movies)
    
    movies %>% 
      group_by(Action) %>% 
      mutate(votes.grp = sum(votes)) %>% 
      ggplot(aes(x=rating, weight=votes/votes.grp, group = Action, colour = Action)) +
      geom_density()
    

    graph output by the code