Search code examples
rggplot2bar-chartsjplot

ggplot: showing % instead of counts in charts of categorical variables with multiple levels


I would like to create a barplot like this:

library(ggplot2)

# Dodged bar charts
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position="dodge")

However, instead of counts, I want to have the percentage of observations falling into each 'clarity' category by cutting category ('fair', 'good', 'very good' ...).

With this ...

# Dodged bar charts
ggplot(diamonds, aes(clarity, fill=cut)) + 
geom_bar(aes(y = (..count..)/sum(..count..)), position="dodge")

I get percentages on the y-axis, but these percentages ignore the cut-factor. I want that all the red bars sum up to 1, all the yellow bars sum up to 1 etc.

Is there an easy way to make that work without having to prepare the data manually?

Thanks!

P.S.: This is a follow-up to this stackoverflow question


Solution

  • You could use sjp.xtab from the sjPlot-package for that:

    sjp.xtab(diamonds$clarity, 
             diamonds$cut, 
             showValueLabels = F, 
             tableIndex = "row", 
             barPosition = "stack")
    

    enter image description here

    The data preparation for stacked group-percentages that sum up to 100% should be:

    data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))
    

    thus, you could write

    mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))
    ggplot(mydf, aes(Var1, Freq, fill = Var2)) + 
      geom_bar(position = "stack", stat = "identity") +
      scale_y_continuous(labels=scales::percent)
    

    Edit: This one adds up each category (Fair, Good...) to 100%, using 2 in prop.table and position = "dodge":

    mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),2))
    ggplot(mydf, aes(Var1, Freq, fill = Var2)) + 
        geom_bar(position = "dodge", stat = "identity") +
        scale_y_continuous(labels=scales::percent)
    

    or

    sjp.xtab(diamonds$clarity, 
             diamonds$cut, 
             showValueLabels = F, 
             tableIndex = "col")
    

    enter image description here

    Verifying the last example with dplyr, summing up percentages within each group:

    library(dplyr)
    mydf %>% group_by(Var2) %>% summarise(percsum = sum(Freq))
    
    >        Var2 percsum
    > 1      Fair       1
    > 2      Good       1
    > 3 Very Good       1
    > 4   Premium       1
    > 5     Ideal       1
    

    (see this page for further plot-options and examples from sjp.xtab...)