Search code examples
rggplot2stacked-bar-chartlikert

How do I change the order of x axis variables and add another layer of x axis labelling for before and after (in R)?


I have gone through and followed this guide on how to make stacked bar plots with percentages: Plot stacked bar chart of likert variables in R

Issue 1: it has organised the bars alphabetically, not in the order I had them.

Issue 2: I have Before and After responses for each of 5 questions and I cannot figure out how to have "Before" "After" underneath each stacked plot, and then below that have "Question 1".

Issue 3: I also would like to have the groups of 2 stacked plots for each question, separated from the other questions a little bit.

My plot currently

Here is a snippet of my data:data

This is the code I have used:

graphdata3 <- graphdata3 %>% gather(key='Question_num', value='Answer', -Participant)

    graphdata3$Answer <- factor(graphdata3$Answer,
                           levels=5:1,
                           labels=c('Strongly Agree','Agree','Neutral','Disagree','Strongly Disagree'))

    ggplot(graphdata3, aes(x=Question_num)) +
      geom_bar(aes(fill=Answer), position="fill") +
      scale_fill_brewer(palette='Spectral', direction=-1) +
      scale_y_continuous(expand=expansion(0), labels=scales::percent_format()) +
      labs(    x='Questions', y='Proportion of Answers (%)') +
      theme_classic() +
      theme(legend.position='top')

Solution

  • To fix issue 2 and 3 I would suggest to use facetting, i.e. split Question_num into the question id and the "timepoint" (aka "Before" and "After"). Then facet your chart by the question id and map the timepoint on x. Additionally, this requires some styling like putting the facet labels at the bottom, placing the on the outside of the axis and getting rid of the box drawn around each label.

    Concerning your first issue, if you want a specific order then convert to a factor with the levels set according to your desired order. Guessing that you want to bars in the order "Before" and "After" make time a factor.

    Using some fake random example data:

    library(dplyr, warn.conflicts = FALSE)
    library(tidyr)
    library(ggplot2)
    
    #### Create example data
    set.seed(123)
    graphdata3 <- data.frame(Participant = 1:20)
    
    for (qid in 1:5) {
      for (time in c("Before", "After")) {
        graphdata3[[paste0("Q", qid, ".", time)]] <- sample(1:5, 20, replace = TRUE)
      }
    }
    ####
    
    graphdata3 <- graphdata3 |>
      tidyr::pivot_longer(-Participant, names_to = "Question_num", values_to = "Answer") |>
      tidyr::separate(Question_num, into = c("qid", "time"), sep = "\\.") |>
      mutate(time = factor(time, c("Before", "After")))
    
    graphdata3$Answer <- factor(graphdata3$Answer,
      levels = 5:1,
      labels = c("Strongly Agree", "Agree", "Neutral", "Disagree", "Strongly Disagree")
    )
    
    ggplot(graphdata3, aes(x = time)) +
      geom_bar(aes(fill = Answer), position = "fill") +
      scale_fill_brewer(palette = "Spectral", direction = -1) +
      scale_y_continuous(expand = expansion(0), labels = scales::percent_format()) +
      facet_wrap(~qid, nrow = 1, strip.position = "bottom") +
      labs(x = "Questions", y = "Proportion of Answers (%)") +
      theme_classic() +
      theme(
        legend.position = "top",
        strip.placement = "outside",
        strip.background.x = element_blank()
      )