I am trying to use R to create a barchart to compare the frequency of a category to that of the entire dataset. I created some mock data which is similar to the real one and my expected output. My mock data includes three fruits (apple, orange, banana) with the equivalent eating frequency (1-2 times, 3-4 times, > 4 times). Mock data:
ID Fruit frequency
1 apple 1-2 times
2 apple 3-4 times
3 apple 1-2 times
4 apple 3-4 times
5 apple 1-2 times
6 apple > 4 times
7 orange 3-4 times
8 orange 3-4 times
9 orange 1-2 times
10 orange 1-2 times
11 orange 1-2 times
12 banana 1-2 times
13 banana 3-4 times
14 banana > 4 times
15 banana > 4 times
16 banana 1-2 times
17 banana 3-4 times
18 banana > 4 times
19 banana 1-2 times
The expected output is a bar chart with 3 groups of eating frequency (1-2 times, 3-4 times, > 4 times). With each of these groups, there will be two columns, one column represent "apple", the other column represent "the entire dataset".
I could create the frequency barchart for the each category (like apple) but don't know how to add the entire dataset data for comparison.
Any suggestion which codes to use or which approach to take (subset "apple" maybe?) will be much appreciated!
First I calculated both percentage (i.e. within fruits and in total) and then converted data into plot friendly format.
library(ggplot2)
library(dplyr)
library(tidyr)
df %>%
group_by(fruit) %>%
mutate(countF = n()) %>%
group_by(freq, add=T) %>%
#frequency percentage within fruit
mutate(freq_perc_within_fruit = round(n()/countF * 100)) %>%
group_by(freq) %>%
#frequency percentage in total
mutate(freq_perc_in_total = round(n()/nrow(.) * 100)) %>%
select(fruit, freq, freq_perc_within_fruit, freq_perc_in_total) %>%
gather(Percentage, value, -fruit, - freq) %>%
#plot
ggplot(aes(x = freq, y=value, fill=Percentage)) +
geom_bar(position = "dodge", stat = "identity") +
facet_grid(fruit ~ .) +
geom_text(aes(label = paste0(value, "%")), position=position_dodge(.9), vjust=0)
Output plot is:
Sample data:
df<- structure(list(ID = 1:19, fruit = c("apple", "apple", "apple",
"apple", "apple", "apple", "orange", "orange", "orange", "orange",
"orange", "banana", "banana", "banana", "banana", "banana", "banana",
"banana", "banana"), freq = c("1-2 times", "3-4 times", "1-2 times",
"3-4 times", "1-2 times", "> 4 times", "3-4 times", "3-4 times",
"1-2 times", "1-2 times", "1-2 times", "1-2 times", "3-4 times",
"> 4 times", "> 4 times", "1-2 times", "3-4 times", "> 4 times",
"1-2 times")), .Names = c("ID", "fruit", "freq"), class = "data.frame", row.names = c(NA,
-19L))