Search code examples
rdataframemergeaggregatena

How to aggregate by names in two columns in R?


I got a Data Frame:

a <- c('A','A','B','B','A')
b <- c(1,1,1,1,2)
c <- c(NA,60,NA,100,NA)
d <- c(10,NA,10,NA,100)

frame <- data.frame(a,b,c,d)

> frame
  a  b   c  d
1 A  1  NA  10
2 A  1  60  NA
3 B  1  NA  10
4 B  1 100  NA
5 A  2  NA  100

And I want to aggregate it by a and b

>frame2
  a  b   c  d
1 A  1  60  10
3 B  1 100  10
5 A  2  NA  100

I tried several things like aggregat() and group from dplyr but somehow it never works. I guess the NA is a problem.


Solution

  • With aggregate, we may need to use na.action

    aggregate(.~ a + b, frame, sum, na.rm = TRUE, na.action = 'na.pass')
    #   a b   c   d
    #1 A 1  60  10
    #2 B 1 100  10
    #3 A 2   0 100
    

    If we intend to subset the rows

    library(dplyr)
    frame %>% 
        group_by(a, b) %>%
        mutate_at(vars(-group_cols()), ~ .[order(is.na(.))]) %>% 
        slice(1)
    # A tibble: 3 x 4
    # Groups:   a, b [3]
    #  a         b     c     d
    #  <fct> <dbl> <dbl> <dbl>
    #1 A         1    60    10
    #2 A         2    NA   100
    #3 B         1   100    10