Search code examples
raggregate

Aggregate works with complete cases


I would like to aggregate multiple columns by a character vector. However I think that base R aggregate works on complete cases so if a summation variable is missing for one variable it is excluded from all sums. This illustrates the issue

v1 <- c(1, 1, 1, NA)
v2 <- c(2, 2, NA, 2)
v3 <- c("A", "B", "A", "B")

df <- data.frame(v1, v2, v3)

aggregate(.~v3, data=df, FUN=sum)

With the output

  v3 v1 v2
1  A  1  2
2  B  1  2

the outputs I was expecting is

  v3 v1 v2
1  A  2  2
2  B  1  4

So v1 sums to 3 and v2 sums to 6, as they are in df

Is there a change to aggregate that I can use to produce my desired output? Thanks.


Solution

  • aggregate has an argument na.action

    na.action: a function which indicates what should happen when the data contain ‘NA’ values. The default is to ignore missing values in the given variables.

    which gives the result you're expecting when used with na.pass

    aggregate(. ~ v3, df, \(x) sum(x, na.rm=T), na.action=na.pass)
      v3 v1 v2
    1  A  2  2
    2  B  1  4