Search code examples
rgroup-bymultiple-columns

Group-by operation for another column R


I'm looking to perform operations for one column based on grouping for another column.

Say I have the following data:

user <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
score <- c(1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1)
time_1 <- c(130, NA, 120, 245, NA, NA, NA, 841, NA, NA, 721, 612)
time_2 <- c(NA, 742, NA, NA, 812, 212, 214, NA, 919, 528, NA, NA)
df <- data.frame(user, score, time_1, time_2) 

We get the following df:

   user score time_1 time_2
    1     1    130     NA
    1     0     NA    742
    1     1    120     NA
    1     1    245     NA
    2     0     NA    812
    2     0     NA    212
    2     0     NA    214
    2     1    841     NA
    3     0     NA    919
    3     0     NA    528
    3     1    721     NA
    3     1    612     NA

For every user 1, what is the smallest value of time_1? So I am looking to group users by their number, and perform an operation on column time_1.


Solution

  • Update on OP request(see comments): Just replace summarise with mutate:

    df %>% 
      group_by(user) %>% 
      mutate(Smallest_time1 = min(time_1, na.rm=TRUE))
    
        user score time_1 time_2 Smallest_time1
       <dbl> <dbl>  <dbl>  <dbl>          <dbl>
     1     1     1    130     NA            120
     2     1     0     NA    742            120
     3     1     1    120     NA            120
     4     1     1    245     NA            120
     5     2     0     NA    812            841
     6     2     0     NA    212            841
     7     2     0     NA    214            841
     8     2     1    841     NA            841
     9     3     0     NA    919            612
    10     3     0     NA    528            612
    11     3     1    721     NA            612
    12     3     1    612     NA            612
    

    We could use min() inside summarise with na.rm=TRUE argument:

    library(dplyr)
    df %>% 
      group_by(user) %>% 
      summarise(Smallest_time1 = min(time_1, na.rm= TRUE))
    
     user Smallest_time1
      <dbl>          <dbl>
    1     1            120
    2     2            841
    3     3            612