Search code examples
rdplyrtidyversetidyr

summarise based on multiple groups in R dplyr


I have a large data frame that looks like this

library(tidyverse)

df <- tibble(id=c(1,1,2,2,2,3), counts=c(10,20,15,15,10,20), fruit=c("apple","banana","cherry","cherry","ananas","pear"))
df
#> # A tibble: 6 × 3
#>      id counts fruit 
#>   <dbl>  <dbl> <chr> 
#> 1     1     10 apple 
#> 2     1     20 banana
#> 3     2     15 cherry
#> 4     2     15 cherry
#> 5     2     10 ananas
#> 6     3     20 pear

Created on 2022-04-13 by the reprex package (v2.0.1)

For each id, I want to keep the fruit with the maximum counts and then I want to add the sum_counts of unique fruits per id in another column.

I want my data to look like this:

# A tibble: 3 × 4
     id central_fruit fruits         sum_counts
  <dbl> <chr>         <chr>               <dbl>
1     1 banana        banana, apple          30
2     2 cherry        cherry, ananas         30
3     3 pear          pear                   20

This is what I have tried so far and I do not know why I fail miserably

library(tidyverse)

df <- tibble(id=c(1,1,2,2,2,3), counts=c(10,20,15,15,15,20), fruit=c("apple","banana","cherry","cherry","ananas","pear"))

df %>% 
  group_by(id,fruit) %>% 
  add_count(fruit) %>% 
  ungroup() %>% 
  group_by(id) %>% 
  summarise(central_fruit=fruit[which.max(counts)],
            fruits = toString(sort(unique(fruit), decreasing = TRUE)),
            sum_counts = sum(unique(counts)))
#> # A tibble: 3 × 4
#>      id central_fruit fruits         sum_counts
#>   <dbl> <chr>         <chr>               <dbl>
#> 1     1 banana        banana, apple          30
#> 2     2 cherry        cherry, ananas         15
#> 3     3 pear          pear                   20

Created on 2022-04-13 by the reprex package (v2.0.1)


Solution

  • Here's a dplyr approach.

    library(dplyr)
    
    df <- tibble(id=c(1,1,2,2,2,3), counts=c(10,20,15,15,10,20), fruit=c("apple","banana","cherry","cherry","ananas","pear"))
    
    df %>% 
      group_by(id) %>% 
      mutate(fruits = paste0(unique(fruit), collapse = ", "),
             sum_counts = sum(unique(counts))) %>% 
      filter(counts == max(counts)) %>% 
      distinct() %>% 
      rename("central_fruit" = "fruit") %>% 
      select(-counts)
    #> # A tibble: 3 × 4
    #> # Groups:   id [3]
    #>      id central_fruit fruits         sum_counts
    #>   <dbl> <chr>         <chr>               <dbl>
    #> 1     1 banana        apple, banana          30
    #> 2     2 cherry        cherry, ananas         25
    #> 3     3 pear          pear                   20
    

    Created on 2022-04-13 by the reprex package (v2.0.1)