Search code examples
rgroup-bydplyrsummarize

How to keep other columns when using dplyr?


I have a similar problem as described How to aggregate some columns while keeping other columns in R?, but none of the solutions from there which I have tried work.

I have a data frame like this:

df<-data.frame(a=rep(c("a","b"),each=2),b=c(500,400,200,300), 
               c = c(5,10,2,4),stringsAsFactors = FALSE) 
> df
  a   b  c
1 a 500  5
2 a 400 10
3 b 200  2
4 b 300  4

df%>%
  group_by(a)%>%
  summarise('max' = max(c), 'sum'=sum(c))

  a       max   sum
  <chr> <dbl> <dbl>
1 a        10    15  
2 b         4     6

but I need also column b:

1 a        10    15   400
2 b         4     6   300

The value for column b is max(c).


Edit data for specific case:

> df
  a   b  c
1 a 500  5
2 a 400  5

in this case, I need a higher value col b in the summary

#   a       max   sum     b
#   <chr> <dbl> <dbl> <dbl>
# 1 a         5    10   500

Solution

  • You have to specify how to summariz the variable b:

    df %>%
      group_by(a) %>%
      summarise(max = max(c), sum = sum(c), b = max(b[c == max(c)]))
    
    # # A tibble: 2 x 4
    #   a       max   sum     b
    #   <chr> <dbl> <dbl> <dbl>
    # 1 a        10    15   400
    # 2 b         4     6   300