so I have a df that roughly looks like this
| x1| x2| x3| x4|
| a | 5 | NA| 0 |
| a | 0 | 7 | 0 |
| a | NA| 12| 0 |
| b | NA| 6 | 5 |
| b | NA| 3 | 0 |
and I'm trying to sum up all the variables from the same row's based on the x1 column value, while still having the NA present if there really is nothing there
what I want is a DF that looks like the following
| x1| x2| x3| x4|
| a | 5 | 19| 0 |
| b | NA| 9 | 5 |
where totals sum up but the values with NA (such as x2 for b in this case) stay
I tried this
aggregate(df[,-1], df["x1"], FUN=sum)
and it gives me
x1| x2| x3| x4|
a | NA| NA| 0 |
b | NA| 9 | 5 |
erasing the summed values entirely where there's a NA value in the column
I also tried
aggregate(df[,-1], df["x1"], FUN=sum,na.rm=TRUE, na.action=NULL)
which gives me
| x1| x2| x3| x4|
| a | 5 | 19| 0 |
| b | 0 | 9 | 5 |
and turns all the NA's to zero's thus erasing that aspect of the data
does anyone know a way around this, or a different function I could utilize to accomplish the same thing?
One solution would be to check whether all values are NA and return NA if this is indeed the case but calculate the sum if not, i.e., to use this as the function:
> aggregate(df.demo[, -1], df.demo["x1"], FUN = \(x) if (all(is.na(x))) NA else sum(x, na.rm = T))
# output
# x1 x2 x3 x4
# 1 a 5 19 0
# 2 b NA 9 5