Search code examples
rgraph

Compare values of a single category to all (including category) in R


I am trying to use R to create a barchart to compare the frequency of a category to that of the entire dataset. I created some mock data which is similar to the real one and my expected output. My mock data includes three fruits (apple, orange, banana) with the equivalent eating frequency (1-2 times, 3-4 times, > 4 times). Mock data:

ID  Fruit   frequency
1   apple   1-2 times
2   apple   3-4 times
3   apple   1-2 times
4   apple   3-4 times
5   apple   1-2 times
6   apple   > 4 times
7   orange  3-4 times
8   orange  3-4 times
9   orange  1-2 times
10  orange  1-2 times
11  orange  1-2 times
12  banana  1-2 times
13  banana  3-4 times
14  banana  > 4 times
15  banana  > 4 times
16  banana  1-2 times
17  banana  3-4 times
18  banana  > 4 times
19  banana  1-2 times

The expected output is a bar chart with 3 groups of eating frequency (1-2 times, 3-4 times, > 4 times). With each of these groups, there will be two columns, one column represent "apple", the other column represent "the entire dataset".

I could create the frequency barchart for the each category (like apple) but don't know how to add the entire dataset data for comparison.

Any suggestion which codes to use or which approach to take (subset "apple" maybe?) will be much appreciated!

enter image description here


Solution

  • First I calculated both percentage (i.e. within fruits and in total) and then converted data into plot friendly format.

    library(ggplot2)
    library(dplyr)
    library(tidyr)
    
    df %>%
      group_by(fruit) %>%
      mutate(countF = n()) %>%
      group_by(freq, add=T) %>%
    #frequency percentage within fruit
      mutate(freq_perc_within_fruit = round(n()/countF * 100)) %>%
      group_by(freq) %>%
    #frequency percentage in total
      mutate(freq_perc_in_total = round(n()/nrow(.) * 100)) %>%
      select(fruit, freq, freq_perc_within_fruit, freq_perc_in_total) %>%
      gather(Percentage, value, -fruit, - freq) %>%
    #plot
      ggplot(aes(x = freq, y=value, fill=Percentage)) + 
        geom_bar(position = "dodge", stat = "identity") +
        facet_grid(fruit ~ .) +
        geom_text(aes(label = paste0(value, "%")), position=position_dodge(.9), vjust=0)
    

    Output plot is:

    enter image description here

    Sample data:

    df<- structure(list(ID = 1:19, fruit = c("apple", "apple", "apple", 
    "apple", "apple", "apple", "orange", "orange", "orange", "orange", 
    "orange", "banana", "banana", "banana", "banana", "banana", "banana", 
    "banana", "banana"), freq = c("1-2 times", "3-4 times", "1-2 times", 
    "3-4 times", "1-2 times", "> 4 times", "3-4 times", "3-4 times", 
    "1-2 times", "1-2 times", "1-2 times", "1-2 times", "3-4 times", 
    "> 4 times", "> 4 times", "1-2 times", "3-4 times", "> 4 times", 
    "1-2 times")), .Names = c("ID", "fruit", "freq"), class = "data.frame", row.names = c(NA, 
    -19L))