I have data which looks like this
df <- data.frame (
cancer = c(1, 0, 1, 0, 0, 1, 0, 0, 0, 0),
CVD = c(0, 1, 1, 0, 1, 0, 0, 0, 0, 0),
diab = c(0, 0, 0, 1, 0, 1, 0, 0, 1, 0),
stroke = c(0, 1, 1, 0, 1, 0, 0, 0, 1, 0),
asthma = c(1, 1, 1, 0, 1, 1, 0, 0, 0, 0),
SR_hlt = c(1, 2, 2, 2, 1, 1, 2, 2, 2, 1))
What I want to do is produce a bar plot, only for the people who have the disease of interest, where the bars of the bar plot are ordered by the proportion of people whose SR_hlt == 1.
To make this plot, I use the following code
1) Gather the data
df_grp <- df %>%
gather(key = condition, value = Y_N, -SR_hlt) %>%
group_by(condition, Y_N, SR_hlt) %>%
summarise(count = n()) %>%
mutate(freq = round(count/sum(count) * 100, digits = 1))
2) Plot this data
df_plot <- df_grp %>%
filter(Y_N == 1) %>%
ggplot(aes(x = reorder(condition, -freq), y = freq, fill = factor(SR_hlt)), width=0.5) +
geom_bar(stat="identity", position = position_dodge(0.9))
df_plot
The x = reorder(condition, -freq)
should be the thing which orders the bars, but I don't think this is working in this case, because the freq values are dependent on the value of a third variable, SR_hlt. Is it possible to order the bars by the value of freq
when the value of SR_hlt == 1?
This can be accomplished using the handy package forcats
, specifically fct_reorder2
df_plot <- df_grp %>%
filter(Y_N == 1) %>%
ggplot(aes(x = fct_reorder2(condition, SR_hlt, -freq),
y = freq, fill = factor(SR_hlt)), width=0.5) +
geom_bar(stat="identity", position = position_dodge(0.9))
df_plot
This is setting condition
as a factor, and since SR_hlt == 1
is of interest, we arrange from low to high for SR_hlt
, followed by -freq
, or from high to low for freq
.
Alternatively, you can set the factor before the ggplot
call using standard dplyr
only:
df_plot <- df_grp %>%
ungroup() %>%
filter(Y_N == 1) %>%
arrange(SR_hlt, desc(freq)) %>%
mutate(condition = factor(condition, unique(condition))) %>%
ggplot(aes(x = condition, y = freq, fill = factor(SR_hlt)), width=0.5) +
geom_bar(stat="identity", position = position_dodge(0.9))
df_plot
In the above, I use arrange
to sort the dataframe for highest freq
for SR_hlt
. Next, I use mutate
to take advantage of the sorted dataframe by factoring condition
in the order of appearance.