Search code examples
raggregatesummarize

aggregate in r is removing 0 count in table. How to make it show?


I'm trying to generate a table with aggregate for further processing using FUN argument in aggregate. Here's my code

var <- c(1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

aggregate(var, list(group), FUN = table)

but, the result is showing up like this where group 4 has no value for the second level (0 or 1)

  Group.1      x
1       1  61, 3
2       2 112, 3
3       3  30, 2
4       4     21

The desired result should be like this:

  Group.1  x.0  x.1
1       1  61   3
2       2 112   3
3       3  30   2
4       4  21   0

How do I fix this?


Solution

  • We need to create the 'var' as factor with levels specified as 0 and 1. This would make sure that if the count is 0 for a particular column to show up as 0

    var <- factor(var, levels = 0:1)
    out <- aggregate(var, list(group), FUN = table)
    out
    #   Group.1 x.0 x.1
    #1       1  29  21
    #2       2  29  21
    #3       3  23  27
    #4       4   0  50
    

    Or use the formula method

    out <- aggregate(var ~ group, FUN = table)
    

    Note that these will result in a matrix 'x' with two columns. Inorder to have as regular data.frame columns

    do.call(data.frame, out)
    

    Or in case, for a change, we can also get the sum of 1s and then reshape

    reshape(aggregate(cbind(n = rep(1, length(group))) ~ 
      group+ var, FUN = sum), idvar = 'group', direction = 'wide', timevar = 'var')
    

    If we are using tidyverse, instead of doing any change in the class i.e. converting to factor, do a count using both the variables and then spread it to 'wide' format

    library(tidyverse)
    tibble(var, group) %>% 
        count(var, group) %>%
        spread(var, n, fill = 0)
    # A tibble: 4 x 3
    #  group   `0`   `1`
    #  <int> <dbl> <dbl>
    #1     1    29    21
    #2     2    29    21
    #3     3    23    27
    #4     4     0    50
    

    data

    set.seed(24)
    var <- sample(0:1, 200, replace = TRUE)
    var[151:200] <- 1
    group <- rep(1:4, each = 50)