Search code examples
rforcats

reordering a factor A by the numeric values of a factor B


Hi there: I have a data set that looks like this. I my data set, alpha, omega and zeta are the names of issues. Respondents were asked rate a party leader ('Z', 'B' or 'C') as the leader that would best manage that issue.

I would like to show the distribution of responses for each issue, but I would like to see the facets ordered such that the first facet shows the highest percent for a particular party leader (e.g. Z) and then moving down.

In the code below, I specifically chosen variable names that span the length of the alphabet (e.g. alpha to zeta) and not set a seed, because I want to get some code back that always orders the levels of the variable Issue such that the first level is the issue that party leader Z scored highest on, and that the second level is the issue that party leader Z scored second-highest on.

#load libraries
library(dplyr)
library(forcats)
library(tidyr)
library(ggplot2)

#In my data set these are issues, like taxes, health, etc. 
alpha<-sample(c('Z', 'B', 'C'), replace=T,size=300)
omega<-sample(c('Z', 'B', 'C'), replace=T,size=300)
zeta<-sample(c('Z', 'B', 'C'), replace=T, size=300)

#make data frame
df<-data.frame(alpha, omega, zeta)

df %>% 
  #gather into an issue variable and a leader variable
  gather(Issue, Leader) %>% 
  #count
  count(Issue, Leader) %>% 
  #form groups for counting percent
  group_by(Issue) %>% 
  #calculate percent
  mutate(pct=n/sum(n)) %>%
  #ungroup
  group_by(Leader)%>% 
  #try reordering based on
  mutate(Issue=fct_reorder(Issue, pct, .desc=F)) %>% 
  ggplot(., aes(x=Leader, y=pct))+geom_col()+facet_wrap(~Issue)

Solution

  • For such a specific use case, I would find and set the order explicitly:

    df %>% 
      gather(Issue, Leader) %>% 
      count(Issue, Leader) %>% 
      group_by(Issue) %>% 
      mutate(pct=n/sum(n)) %>% 
      ungroup -> 
      plot_df
    
    issue_order = filter(plot_df, Leader == "Z") %>% 
        arrange(desc(pct)) %>% 
        pull(Issue) %>%
        as.character
    
    plot_df = mutate(plot_df, Issue = factor(Issue, levels = issue_order))
    
    ggplot(plot_df, aes(x=Leader, y=pct))+geom_col()+facet_wrap(~Issue)
    

    As a side-note, I'd encourage you to improve your comments by avoiding obvious comments. It's good to comment code, but good code (especially dplyr code) is pretty well self-documenting. One common comment best practice is "comment why, not how," with the idea that the code tells you what happens, comments are needed mostly to explain why. Comments like this (below) add no value, and instead break up your meaningful code making it harder to read:

    #count
    count(Issue, Leader) %>% 
    

    Here, you use a nice variable name for percent, pct, so you don't need a comment to tell you what it is:

    #calculate percent
    mutate(pct=n/sum(n)) %>%