Search code examples
rgroup-byaggregatesummarize

group_by and count number of elements in each column in R


I have a data table like below:

city         year    t_20   t_25 
Seattle      2019    82      91  
Seattle      2018     0      103   
NYC          2010    78       8 
DC           2011    71       0  
DC           2011     0       0    
DC           2018    60       0

I would like to group them by city and year and count the number of zeros in each group.

How can I do this? by summarize_at?

df %>% group_by(city, year) %>% summarise_at( WHAT GOES HERE , vars(t_20:t_25))

What should be the first argument of summarize_at?

or any other way? tally?


Solution

  • An option is to reshape from wide to long before summariseing

    library(tidyverse)
    df %>%
        gather(k, v, -city, -year) %>%
        group_by(city, year) %>%
        summarise(n_0 = sum(v == 0)) 
    #    # A tibble: 5 x 3
    ## Groups:   city [?]
    #  city     year   n_0
    #  <fct>   <int> <int>
    #1 DC       2011     3
    #2 DC       2018     1
    #3 NYC      2010     0
    #4 Seattle  2018     1
    #5 Seattle  2019     0
    

    To summarise for each column separate you can do

    df %>%
        group_by(city, year) %>%
        summarise_all(funs(sum(. == 0)))
    ## A tibble: 5 x 4
    ## Groups:   city [?]
    #  city     year  t_20  t_25
    #  <fct>   <int> <int> <int>
    #1 DC       2011     1     2
    #2 DC       2018     0     1
    #3 NYC      2010     0     0
    #4 Seattle  2018     1     0
    #5 Seattle  2019     0     0
    

    Sample data

    df <- read.table(text =
        "city         year    t_20   t_25
    Seattle      2019    82      91
    Seattle      2018     0      103
    NYC          2010    78       8
    DC           2011    71       0
    DC           2011     0       0
    DC           2018    60       0", header = T)