Search code examples
rggplot2geom-bar

Order histograms in ggplot by proportion depending only on the 'yes' value of another variable in R


I have data which looks like this

df <- data.frame (
cancer = c(1, 0, 1, 0, 0, 1, 0, 0, 0, 0),
CVD =    c(0, 1, 1, 0, 1, 0, 0, 0, 0, 0),
diab =   c(0, 0, 0, 1, 0, 1, 0, 0, 1, 0),
stroke = c(0, 1, 1, 0, 1, 0, 0, 0, 1, 0),
asthma = c(1, 1, 1, 0, 1, 1, 0, 0, 0, 0),
SR_hlt = c(1, 2, 2, 2, 1, 1, 2, 2, 2, 1))

What I want to do is produce a bar plot, only for the people who have the disease of interest, where the bars of the bar plot are ordered by the proportion of people whose SR_hlt == 1.

To make this plot, I use the following code

1) Gather the data

df_grp <- df %>%
gather(key = condition, value = Y_N, -SR_hlt) %>%
group_by(condition, Y_N, SR_hlt) %>%
summarise(count = n()) %>%
mutate(freq = round(count/sum(count) * 100, digits = 1))

2) Plot this data

df_plot <- df_grp  %>%
filter(Y_N == 1) %>%
ggplot(aes(x = reorder(condition, -freq), y = freq, fill = factor(SR_hlt)), width=0.5) +
geom_bar(stat="identity", position = position_dodge(0.9))
df_plot

The x = reorder(condition, -freq) should be the thing which orders the bars, but I don't think this is working in this case, because the freq values are dependent on the value of a third variable, SR_hlt. Is it possible to order the bars by the value of freq when the value of SR_hlt == 1?


Solution

  • This can be accomplished using the handy package forcats, specifically fct_reorder2

    df_plot <- df_grp  %>%
      filter(Y_N == 1) %>%
      ggplot(aes(x = fct_reorder2(condition, SR_hlt, -freq), 
                 y = freq, fill = factor(SR_hlt)), width=0.5) +
      geom_bar(stat="identity", position = position_dodge(0.9))
    df_plot
    

    This is setting condition as a factor, and since SR_hlt == 1 is of interest, we arrange from low to high for SR_hlt, followed by -freq, or from high to low for freq.


    Alternatively, you can set the factor before the ggplot call using standard dplyr only:

    df_plot <- df_grp  %>%
      ungroup() %>% 
      filter(Y_N == 1) %>%
      arrange(SR_hlt, desc(freq)) %>% 
      mutate(condition = factor(condition, unique(condition))) %>% 
      ggplot(aes(x = condition, y = freq, fill = factor(SR_hlt)), width=0.5) +
      geom_bar(stat="identity", position = position_dodge(0.9))
    df_plot
    

    In the above, I use arrange to sort the dataframe for highest freq for SR_hlt. Next, I use mutate to take advantage of the sorted dataframe by factoring condition in the order of appearance.