Search code examples
rsumaggregatena

trying to aggregate and sum som valus that include NA in R, but not loose the NA if there are no variables in that column


so I have a df that roughly looks like this

| x1| x2| x3| x4|

| a | 5 | NA| 0 |

| a | 0 | 7 | 0 |

| a | NA| 12| 0 |

| b | NA| 6 | 5 |

| b | NA| 3 | 0 |

and I'm trying to sum up all the variables from the same row's based on the x1 column value, while still having the NA present if there really is nothing there

what I want is a DF that looks like the following

| x1| x2| x3| x4|

| a | 5 | 19| 0 |

| b | NA| 9 | 5 |

where totals sum up but the values with NA (such as x2 for b in this case) stay

I tried this

aggregate(df[,-1], df["x1"], FUN=sum)

and it gives me

x1| x2| x3| x4|

a | NA| NA| 0 |

b | NA| 9 | 5 |

erasing the summed values entirely where there's a NA value in the column

I also tried

aggregate(df[,-1], df["x1"], FUN=sum,na.rm=TRUE, na.action=NULL)

which gives me

| x1| x2| x3| x4|

| a | 5 | 19| 0 |

| b | 0 | 9 | 5 |

and turns all the NA's to zero's thus erasing that aspect of the data

does anyone know a way around this, or a different function I could utilize to accomplish the same thing?


Solution

  • One solution would be to check whether all values are NA and return NA if this is indeed the case but calculate the sum if not, i.e., to use this as the function:

    > aggregate(df.demo[, -1], df.demo["x1"], FUN = \(x) if (all(is.na(x))) NA else sum(x, na.rm = T))
    # output
    #   x1 x2 x3 x4
    # 1  a  5 19  0
    # 2  b NA  9  5