Search code examples
rgroup-bycountsummarize

How to use group_by() and summarize() to count the occurances of datapoints?


p <- data.frame(x = c("A", "B", "C", "A", "B"), 
                y = c("A", "B", "D", "A", "B"), 
                z = c("B", "C", "B", "D", "E"))
p

d <- p %>%  
  group_by(x) %>% 
  summarize(occurance1 = count(x),
            occurance2 = count(y),
            occurance3 = count(z),
            total = occurance1 + occurance2 + occurance3)
d

Output:

A tibble: 3 x 5

  x     occurance1 occurance2 occurance3 total

  <chr>      <int>      <int>      <int> <int>

1 A              2          2          1     5

2 B              2          2          1     5

3 C              1          1          1     3

I have a dataset similar to the one above where I'm trying to get the counts of the different factors in each column. The first one works perfectly, probably because it's grouped by (x), but I've run into various problems with the other two rows. As you can see, it doesn't count "D" at all in y, instead counting it as "C" and z doesn't have an "A" in it, but there's a count of 1 for A. Help?


Solution

  • count needs data.frame/tibble as input and not a vector. To make this work, we may need to reshape to 'long' format with pivot_longer and apply the count on the columns, and then use adorn_totals to get the total column

    library(dplyr)
    library(tidyr)
    library(janitor)
    p %>% 
        pivot_longer(cols = everything()) %>% 
        count(name, value) %>% 
        pivot_wider(names_from = value, values_from = n, values_fill = 0) %>% 
        janitor::adorn_totals('col')
    

    -output

      name A B C D E Total
        x 2 2 1 0 0     5
        y 2 2 0 1 0     5
        z 0 2 1 1 1     5