I would like to aggregate multiple columns by a character vector. However I think that base R aggregate works on complete cases so if a summation variable is missing for one variable it is excluded from all sums. This illustrates the issue
v1 <- c(1, 1, 1, NA)
v2 <- c(2, 2, NA, 2)
v3 <- c("A", "B", "A", "B")
df <- data.frame(v1, v2, v3)
aggregate(.~v3, data=df, FUN=sum)
With the output
v3 v1 v2
1 A 1 2
2 B 1 2
the outputs I was expecting is
v3 v1 v2
1 A 2 2
2 B 1 4
So v1 sums to 3 and v2 sums to 6, as they are in df
Is there a change to aggregate
that I can use to produce my desired output? Thanks.
aggregate
has an argument na.action
na.action: a function which indicates what should happen when the data contain ‘NA’ values. The default is to ignore missing values in the given variables.
which gives the result you're expecting when used with na.pass
aggregate(. ~ v3, df, \(x) sum(x, na.rm=T), na.action=na.pass)
v3 v1 v2
1 A 2 2
2 B 1 4