Search code examples
rggplot2stacked-chart

Stacked AND side-by-side barplot in ggplot()


I have a dataframe that has three factor columns. One column is a 'SurveyDate' column, and the others are attributes about the survey takers; say one is 'Gender' and one is 'HighSchoolGraduate'

I want to create a plot that has date as the x-axis and uses side-by-side bar plots for the number of male and female respondents, and within each of those two bars, stack high school graduate vs. non-high-school-graduate.

testDates <- sample(seq(as.Date('2019/1/1'), as.Date('2019/2/1'), by="day"), 100, replace = TRUE)
gender <- sample(c("F", "M"), 100, replace = TRUE)
graduate <- sample(c("Y", "N"), 100, replace = TRUE)
testdf <- data.frame(testDates, gender, graduate)

I can create a table of frequencies of dates vs. gender and use that to create the side by side plot:

tbl <- with(testdf, table(testDates, gender))
ggplot(as.data.frame(tbl), aes(x=testDates, y=Freq, fill=gender)) +
+ geom_col(position='dodge

This yields: Plot of dates vs. count of gender

So now... how do I divide each of those bars by graduate? (And yes, I should have created more samples for this demo, but the idea still works.)


Solution

  • Using group and fill you can achieve the output you describe. However, I hope it is clear from the output below that this might not be a good way to visualize the data:

    library(ggplot2)
    testDates <- sample(seq(as.Date('2019/1/1'), as.Date('2019/2/1'), by="day"), 100, replace = TRUE)
    gender <- sample(c("F", "M"), 100, replace = TRUE)
    graduate <- sample(c("Y", "N"), 100, replace = TRUE)
     testdf <- data.frame(testDates, gender, graduate)
    
     tbl <- with(testdf, table(testDates, gender, graduate))
    ggplot(as.data.frame(tbl), aes(x=testDates, y=Freq, group=gender, fill = graduate)) +
       geom_col(position='dodge' )
    

    Created on 2019-10-24 by the reprex package (v0.3.0)

    Update

    With interaction you should be able to encode 2 factors on the fill scale

    ggplot(as.data.frame(tbl), aes(x=testDates, y=Freq, group=gender, fill = interaction(gender, graduate))) + geom_col(position='dodge' )