Search code examples
rplotbar-chartfactorstapply

Barplots with multiple factor groupings and mean of variable across those factors


I am trying to create a barplot that shows the average hourly wages of union and nonunion workers grouped by single or married grouped by college grad or not college grad. While I've managed to construct a passable barplot with two factor groupings, I cannot figure out how to do so with three factor groupings. The examples I have seen that have three factors look just at frequency counts, so I'm not sure how to incorporate the mean of another variable across all the factors into the plot. What I am looking to create is something that looks like this (created in Stata): Average Hourly Wage by Union Status, Marital Status, and College Graduation My code looks like this:

levelbar = tapply(wage, list(as.factor(union), as.factor(married), 
as.factor(collgrad)), mean)
par(mfrow = c(1, 2))
barplot(levelbar, beside = TRUE)
barplot(t(levelbar), beside = TRUE)

When I run this, however, I receive the error:

Error in barplot.default(levelbar, beside = TRUE) : 
'height' must be a vector or a matrix

Any help on this would be appreciated. I'm sure ggplot might be useful here, but I do not have a great deal of experience using that package.


Solution

  • Here's a reproducible example using ggplot and the built-in dataset Titanic.

    Note that we calculate the means first and use stat = identity to make sure we get those into the plot.

    # Format the Titanic dataframe
    Titanic_df <- Titanic %>% as_tibble()
    
    # Make Class, Sex, Age, and Survived factors
    for (col in c("Class", "Sex", "Age", "Survived")) {
      Titanic_df[[col]] <- factor(Titanic_df[[col]])
    }
    
    # Get by group means
    means <- Titanic_df %>% 
      group_by(Class, Sex, Survived) %>% 
      summarise(
        mean_n = mean(n)
      )
    
    # Plot: facets are the Classes, bar colors are the two Sexes, and the groupings in each facet are Survived vs. Not Survived
    ggplot(data = means) +
      geom_bar(aes(x = Survived, y = mean_n, fill = Sex), stat = "identity", position = "dodge") +
      facet_wrap(~ Class)
    

    enter image description here