Search code examples
rdplyrfrequency

dplyr: How to calculate frequency of different values within each group


I am probably having a failry easy question but cannnot figure it out.

I am having a dataset that has two variables, both factors. It looks like this:

my.data<-data.frame(name=c("a","a","b","b","b","b", "b", "b", "e", "e", "e"),
                var1=c(1, 2, 3, 4, 2, 1, 4, 1, 3, 4, 3))

I would like to calculate the frequency of 1,2,3 and 4 for all a, b and e aggregated later into one row. That means that all "a", "b" and "e" should be in one row and then I would like to create 4 variables which will indicate the frequency of all 1,2,3 and 4 across these rows. I have managed to calculate the frequencies for all counts of "a", "b" and "e" but I can't collapse all the "a", "b" and "e" into seperate rows.

My code is this one:

a <- my.data %>%
dplyr:: select(name, var1) %>%
mutate(name = as.factor(name),
     var1 = as.factor(var1)) %>% 
group_by(name, var1) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n))

My results should look like this:

name   Freq1   Freq2   Freq3   Freq4
  a    0,00    0,00    0,5     0,5
  b    0,30    0,30    0,30    0,10
  e    0,20    0,20    0,20    0,40

Thanks.


Solution

  • We can also make use of package janitor to great advantage here:

    library(janitor)
    
    my.data %>%
      tabyl(name, var1) %>%
      adorn_percentages()
    
     name         1         2         3         4
        a 0.5000000 0.5000000 0.0000000 0.0000000
        b 0.3333333 0.1666667 0.1666667 0.3333333
        e 0.0000000 0.0000000 0.6666667 0.3333333
    

    OR

    my.data %>%
      tabyl(name, var1) %>%
      adorn_percentages() %>%
      adorn_totals(c('row', 'col')) %>%
      adorn_pct_formatting(2)
    
      name      1      2      3      4   Total
         a 50.00% 50.00%  0.00%  0.00% 100.00%
         b 33.33% 16.67% 16.67% 33.33% 100.00%
         e  0.00%  0.00% 66.67% 33.33% 100.00%
     Total 83.33% 66.67% 83.33% 66.67% 300.00%