Search code examples
rggplot2ggplotly

How to reorder the legend in ggplot and pltly?


I have quite a simple question but I am not sure how to solve it : I am plotting likert scale bar graph.

likert_results2 <-  structure(list(Survey = c("Post survey \nN= 274", "Post survey \nN= 274", 
                                              "Post survey \nN= 274", "Post survey \nN= 274", "Post survey \nN= 274", 
                                              "Post survey \nN= 274", "Pre survey \nN= 429", "Pre survey \nN= 429", 
                                              "Pre survey \nN= 429", "Pre survey \nN= 429", "Pre survey \nN= 429", 
                                              "Pre survey \nN= 429", "Post survey \nN= 276", "Post survey \nN= 276", 
                                              "Post survey \nN= 276", "Post survey \nN= 276", "Post survey \nN= 276", 
                                              "Post survey \nN= 276", "Pre survey \nN= 428", "Pre survey \nN= 428", 
                                              "Pre survey \nN= 428", "Pre survey \nN= 428", "Pre survey \nN= 428", 
                                              "Pre survey \nN= 428"), Response = c("agree", "disagree", "neither agree nor disagree", 
                                                                                   "somewhat agree", "somewhat disagree", "strongly agree", "agree", 
                                                                                   "disagree", "neither agree nor disagree", "somewhat agree", "somewhat disagree", 
                                                                                   "strongly agree", "agree", "disagree", "neither agree nor disagree", 
                                                                                   "somewhat agree", "somewhat disagree", "strongly agree", "agree", 
                                                                                   "disagree", "neither agree nor disagree", "somewhat agree", "somewhat disagree", 
                                                                                   "strongly agree"), Question = c("q1", 
                                                                                                                   "q1", 
                                                                                                                   "q1", 
                                                                                                                   "q1", 
                                                                                                                   "q1", 
                                                                                                                   "q1", 
                                                                                                                   "q1", 
                                                                                                                   "q1", 
                                                                                                                   "q1", 
                                                                                                                   "q1", 
                                                                                                                   "q1", 
                                                                                                                   "q1", 
                                                                                                                   "q2", 
                                                                                                                   "q2", 
                                                                                                                   "q2", 
                                                                                                                   "q2", 
                                                                                                                   "q2", 
                                                                                                                   "q2", 
                                                                                                                   "q2", 
                                                                                                                   "q2", 
                                                                                                                   "q2", 
                                                                                                                   "q2", 
                                                                                                                   "q2", 
                                                                                                                   "q2"
                                                                                   ), prop = c(0.17, 0.21, 0.08, 0.29, 0.16, 0.09, 0.14, 0.16, 0.16, 
                                                                                               0.3, 0.18, 0.07, 0.13, 0.21, 0.11, 0.29, 0.19, 0.07, 0.11, 0.18, 
                                                                                               0.18, 0.28, 0.21, 0.06)), class = c("tbl_df", "tbl", "data.frame"
                                                                                               ), row.names = c(NA, -24L))

# Create data frame with labels
prop_labels <- likert_results2 %>%
  mutate(
    position = case_when(
      Response == "somewhat disagree" | Response == "disagree" | Response == "strongly disagree" ~ "left",
      Response == "neither agree nor disagree" ~ "center",
      Response == "somewhat agree" | Response == "agree"  | Response == "strongly agree" ~ "right"
    )
  ) %>%
  group_by(Question, Survey, position) %>%
  dplyr::summarize(.,label = sum(prop * 100)) %>%
  pivot_wider(names_from = position,
              values_from = label)

# Data frame with left side values
high_columns <- likert_results2 %>%
  filter( Response == "strongly disagree" |  Response == "disagree"| Response == "somewhat disagree" | Response == "neither agree nor disagree") %>%
  mutate(prop = case_when(Response == "strongly disagree" ~ prop * 100,
                          Response == "disagree" ~ prop * 100,
                          Response == "somewhat disagree" ~ prop * 100,
                          Response == "neither agree nor disagree" ~ prop / 2 * 100
  ))
# Data frame with right side values
low_columns <- likert_results2 %>%
  filter(Response == "neither agree nor disagree" | Response == "somewhat agree" | Response == "agree" | Response == "strongly agree") %>%
  mutate(prop = case_when(Response == "neither agree nor disagree" ~ prop / 2 * 100,
                          Response == "somewhat agree" ~ prop * 100,
                          Response == "agree" ~ prop * 100,
                          Response == "strongly agree" ~ prop * 100,
  )) 
# Define empty ggplot object
p <- ggplot() +
  # Add central black line
  geom_hline(yintercept = 0,
             linetype="dashed",
             colour ="darkgrey") +
  # Add right side columns
  geom_bar(
    data = high_columns,
    mapping = aes(x = Survey,
                  y = prop,
                  fill = Response),
    position = position_stack(reverse = F),
    stat = "identity"
  ) +
  # Add left side columns
  geom_bar(
    data = low_columns,
    mapping = aes(x = Survey,
                  y = -prop,
                  fill = Response),
    position = position_stack(reverse = T),
    stat = "identity"
  ) +
  #Right side labels
  geom_text(
    data = prop_labels,
    mapping = aes(
      x = Survey,
      y = -100,
      label = paste(ifelse(is.na(right),0,round(right)) , "%", sep = "")),
    hjust = 1,
    color = "black",
    size = 3
  ) +
  # Central labels
  geom_text(
    data = prop_labels,
    mapping = aes(
      x = Survey,
      y = 0,
      label = paste(ifelse(is.na(center),0,round(center)) , "%", sep = "")),
    hjust = 0.5,
    color = "black",
    size = 3
  ) +
  # Left side labels
  geom_text(
    data = prop_labels,
    mapping = aes(
      x = Survey,
      y = 100,
      label = paste(ifelse(is.na(left),0,round(left)) , "%", sep = "")),
    hjust = -0.2,
    color = "black",
    size = 3
  )  +
  # Scale formatting
  scale_y_continuous(
    breaks = seq(-100, 100, 50),
    limits = c(-105, 105),
    labels = abs
  )  +
  # More formatting
  theme(legend.title = element_blank(),
        legend.position = "right",
        axis.ticks = element_blank(),
        strip.background = element_rect(fill = "#F0F0F0",
                                        color = "#F0F0F0"),
        panel.background = element_blank(),
        panel.border = element_rect(
          colour = "#F0F0F0",
          fill = NA,
          size = 1.5)
  ) +
  facet_wrap(~ Question, scales="free_y",ncol = 1) +
  coord_flip() +
  ylab("Percent of students") +
  xlab("") +
  # Change Likert labels
  scale_fill_manual(name = "Response", values = c("#1E4384","#6495CF","#7278A8","#AFA690", "#E9739B","#B54461","#B1235E") ,labels=c("strongly agree","agree","somewhat agree","neither agree nor disagree","somewhat disagree","disagree","strongly disagree"))
# Print the plot
p 

#plotly graph
ggplotly(p, width = 1200, height = 800) 

The issue I have is with making the items in the legend to be ordered in a proper way. If I run the code without scale_fill_manual, the plot looks like this:

[![enter image description here][1]][1]

all is correct except the legend order when I add scale_fill_manual

when I specify the order with scale_fill_manual, I get this: which indeed change the order in the legend to correct, but not the squares with colours:

[![enter image description here][2]][2]

And when I run ggplotly- that command also removes all my specified order. [1]: https://i.sstatic.net/Z53nF.png [2]: https://i.sstatic.net/QeRnw.png


Solution

  • Your code seems to be missing some variables, so I could not get the same plot to show you, but your question seems to be best answered using an illustrative sample data frame. TL;DR - use breaks= to assign order of keys in a legend.

    The answer to your question lies in understanding how to change aspects of the legend using scale_*_manual():

    • labels= use this to change the appearance (words) of each legend key.

    • values= necessary when you start setting any other arguments. If you supply a named vector or list, you can explicitly assign a color to each level of the underlying factor associated with the data. If you supply a list of colors, they will be assigned according to the order of the labels in the legend. Note, it's not assigned according to the levels of the factor.

    • breaks= use this argument to indicate the order in which legend keys appear.

    Here's the example:

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    df <- data.frame(x=1:100, Low=rnorm(100,5,1.2),Med=rnorm(100,10,2),High=rnorm(100,15,0.8))
    df <- df %>% gather('Status','Values',-x)
    
    p <- ggplot(df, aes(Status,Values)) + geom_boxplot(aes(fill=Status), alpha=0.5)
    p + scale_fill_manual(values=c('red','blue','green'))
    

    enter image description here

    The order in which df$Status appears on the x axis is decided by the order of the levels= in factor(df$Status). It's not what you ask in your question, but it's good to remember. By default, it appears that this was decided alphabetically.

    The legend entries are similarly ordered alphabetically, but this is because the order will default to the order of the levels in factor(df$Status) for a discrete value. The unnamed color vector for values= is therefore assigned based on the order of items in the legend.

    Note what happens if you use labels= to try to get it back to "Low, Med, High":

    p + scale_fill_manual(labels=c('Low','Med','High'), values=c('red','blue','green'))
    

    enter image description here

    Now you should see the danger in assigning labels= with a simple vector. The labels= argument simply renames each of the label of the respective levels... but the order doesn't change. If we wanted to rename the levels, a better approach would be to send labels= a named vector:

    p + scale_fill_manual(
      labels=c('Low'='Lowest','Med'='Medium','High'='Highest'),
      values=c('red','blue','green'))
    

    enter image description here

    If you want to change the order of the items in the legend, you can do that with the breaks= argument. Here, I'll show you all arguments combined:

    p + scale_fill_manual(
      labels=c('Low'='Lowest','Med'='Medium','High'='Highest'),
      values=c('red','blue','green'),
      breaks=c('Low','Med','High'))
    

    enter image description here