Search code examples
rggplot2geom-bar

ggplot geom_bar plot percentages by group and facet_wrap


I want to plot multiple categories on a single graph, with the percentages of each category adding up to 100%. For example, if I were plotting male versus female, each grouping (male or female), would add up to 100%. I'm using the following code, where the percentages appear to be for all groups on both graphs, i.e. if you added up all the bars on the left and right hand graphs, they would total 100%, rather than the yellow bars on the left hand graph totalling 100%, the purple bars on the left hand graph totalling 100% etc.

I appreciate that this is doable by using stat = 'identity', but is there a way to do this in ggplot without wrangling the dataframe prior to plotting?

library(ggplot2)  

tmp <- diamonds %>% filter(color %in% c("E","I")) %>% select(color, cut, clarity)

ggplot(data=tmp,
     aes(x=clarity,
         fill=cut)) + 
  geom_bar(aes(y = (..count..)/sum(..count..)), position="dodge") +
  scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))

enter image description here


Solution

  • When computing the percentages inside ggplot2 you have to do the grouping of the data as you would when summarizing the data before passing it to ggplot. In your case the PANEL column added internally to the data by ggplot2 could be used for the grouping:

    Using after_stat() and ave() to compute the sum of the counts by group this could be achieved like so:

    library(ggplot2)  
    library(dplyr)
    
    tmp <- diamonds %>% 
        filter(color %in% c("E","I")) %>% 
        select(color, cut, clarity)
    
    ggplot(
      data = tmp,
      aes(
        x = clarity,
        fill = cut
      )
    ) +
      geom_bar(
        aes(y = after_stat(count / ave(count, PANEL, FUN = sum))),
        position = "dodge"
      ) +
      scale_y_continuous(labels = scales::percent) +
      facet_wrap(vars(color))
    

    EDIT If you need to group by more than one variable I would suggest to make use of a helper function, where I make use of dplyr for the computations:

    comp_pct <- function(count, PANEL, cut) {
      data.frame(count, PANEL, cut) %>% 
        group_by(PANEL, cut) %>% 
        mutate(pct = count / sum(count)) %>% 
        pull(pct)
    }
    
    ggplot(data=tmp,
           aes(x=clarity,
               fill=cut)) + 
      geom_bar(aes(y = after_stat(comp_pct(count, PANEL, fill))), position="dodge") +
      scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))