Search code examples
raggregateplyrsummarize

aggregate function in R, sum of NAs are 0


I saw a list of questions asked in stack overflow, regarding the following, but never got a satisfactory answer. I will follow up on the following question Blend of na.omit and na.pass using aggregate?

> test <- data.frame(name = rep(c("A", "B", "C"), each = 4),
  var1 = rep(c(1:3, NA), 3),
  var2 = 1:12,
  var3 = c(rep(NA, 4), 1:8))

> test
   name var1 var2 var3
1     A    1    1   NA
2     A    2    2   NA
3     A    3    3   NA
4     A   NA    4   NA
5     B    1    5    1
6     B    2    6    2
7     B    3    7    3
8     B   NA    8    4
9     C    1    9    5
10    C    2   10    6
11    C    3   11    7
12    C   NA   12    8

When I try out the given solution, instead of mean I try to find out the sum

aggregate(. ~ name, test, FUN = sum, na.action=na.pass, na.rm=TRUE)

the solution doesn't work as usual. Accordingly, it converts NA to 0, So the sum of NAs is 0. It displays it as 0 instead of NaN.

Why doesn't the following work for FUN=sum.And how to make it work?


Solution

  • Create a lambda function with a condition to return NaN when all elements are NA

    aggregate(. ~ name, test, FUN = function(x) if(all(is.na(x))) NaN
         else sum(x, na.rm = TRUE), na.action=na.pass)
    

    -output

      name var1 var2 var3
    1    A    6   10  NaN
    2    B    6   26   10
    3    C    6   42   26
    

    It is an expected behavior with sum and na.rm = TRUE. According to ?sum

    the sum of an empty set is zero, by definition.

    > sum(c(NA, NA), na.rm = TRUE)
    [1] 0