Search code examples
rdata-manipulation

Get maximum value from dataset based aggregate number from another column


I have the dataset as follows, which group by ID:

ID, Activity, Duration
1, Reading,     20
1, Work,        40
1, Reading,     30
2, Home,        50
2, Writing,     30
2, Reading,     20
2, Writing,     30

And I want to get another column that tells us the activity with the highest duration, so person 1 should be reading because it takes 50 minutes, and person 2 should be writing because it takes 60 minutes. Below is an example of the desired output.

ID, Activity, Duration, Max_Actitvity
    1, Reading, 20, Reading
    1, Work,    40, Reading
    1, Reading, 30, Reading
    2, Home,    50, Writing
    2, Writing, 30, Writing
    2, Reading, 20, Writing
    2, Writing, 30, Writing

Solution

  • You can use the following code:

    df <- read.table(text = "ID, Activity, Duration
    1, Reading, 20
    1, Work, 40
    1, Reading, 30
    2, Home, 50
    2, Writing, 30
    2, Reading, 20
    2, Writing, 30", header = TRUE, sep = ",")
    
    library(dplyr)
    df %>%
      group_by(ID, Activity) %>%
      mutate(sum_Activity = sum(Duration)) %>%
      group_by(ID) %>%
      mutate(Max_Activity = Activity[which.max(sum_Activity)]) %>%
      select(-sum_Activity) %>%
      ungroup() 
    #> # A tibble: 7 × 4
    #>      ID Activity   Duration Max_Activity
    #>   <int> <chr>         <int> <chr>       
    #> 1     1 " Reading"       20 " Reading"  
    #> 2     1 " Work"          40 " Reading"  
    #> 3     1 " Reading"       30 " Reading"  
    #> 4     2 " Home"          50 " Writing"  
    #> 5     2 " Writing"       30 " Writing"  
    #> 6     2 " Reading"       20 " Writing"  
    #> 7     2 " Writing"       30 " Writing"
    

    Created on 2022-07-21 by the reprex package (v2.0.1)