Search code examples
rgroup-by

sum the value after group_by and based on date time format in R


I have a dataframe with this format :

group                   Time     Grade
    1    2014-03-03 09:15:00     31
    2    2010-10-20 17:22:00     28
    2    2010-10-20 20:08:00     28
    2    2010-11-28 08:43:00     28
    3    2019-12-06 14:41:00     31
    3    2019-12-06 14:41:00     28
    3    2020-01-07 11:01:00     28
    3    2020-02-16 17:35:00     28
    4    2019-07-01 07:10:00     31
    4    2019-08-01 08:56:00     31
    4    2019-09-01 12:03:00     31
    5    2016-08-19 17:12:00     60
    5    2016-08-19 17:10:00     31
    6    2021-11-03 10:41:00     28

I want to sum the value based on Time and each group. If group have multiple row data.Can devide some situation:

  1. condition1. In group have same date but with diffierent time, choose latest time of Grade.
    • ex1 : In group 2 will keep two rows like below:
    • ex2 : In group 5 will keep one row like below:
    2    2010-10-20 20:08:00     28
    2    2010-11-28 08:43:00     28

    5    2016-08-19 17:12:00     60
  1. condition2. In group have same date and same time, choose min of Grade.

    • ex : In group 3 will keep three rows like below:
    3    2019-12-06 14:41:00     28
    3    2020-01-07 11:01:00     28
    3    2020-02-16 17:35:00     28

Final, will keep rows like:

group                   Time     Grade
    1    2014-03-03 09:15:00     31
    2    2010-10-20 20:08:00     28
    2    2010-11-28 08:43:00     28
    3    2019-12-06 14:41:00     28
    3    2020-01-07 11:01:00     28
    3    2020-02-16 17:35:00     28
    4    2019-07-01 07:10:00     31
    4    2019-08-01 08:56:00     31
    4    2019-09-01 12:03:00     31
    5    2016-08-19 17:12:00     60
    6    2021-11-03 10:41:00     28

This is what the result should look like.

group    Total_G
    1         31
    2         56
    3         84
    4         93
    5         60
    6         28

Solution

  • We extract the Date from Time, group it along with group, arrange by 'grade', extract the row where the 'Time' is max, then do a grouping by 'group' and get the sum of 'Grade'

    library(dplyr)
     df1 %>%
       mutate(Time = as.POSIXct(Time)) %>%
       group_by(group, Date = as.Date(Time)) %>%
       arrange(Grade, .by_group = TRUE) %>% 
       slice_max(Time, with_ties = FALSE) %>%
       group_by(group) %>% 
       summarise(Grade = sum(Grade,  na.rm = TRUE), .groups = 'drop')
    

    -output

    # A tibble: 6 × 2
      group Grade
      <int> <int>
    1     1    31
    2     2    56
    3     3    84
    4     4    93
    5     5    60
    6     6    28
    

    data

    df1 <- structure(list(group = c(1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 
    4L, 4L, 5L, 5L, 6L), Time = c("2014-03-03 09:15:00", "2010-10-20 17:22:00", 
    "2010-10-20 20:08:00", "2010-11-28 08:43:00", "2019-12-06 14:41:00", 
    "2019-12-06 14:41:00", "2020-01-07 11:01:00", "2020-02-16 17:35:00", 
    "2019-07-01 07:10:00", "2019-08-01 08:56:00", "2019-09-01 12:03:00", 
    "2016-08-19 17:12:00", "2016-08-19 17:10:00", "2021-11-03 10:41:00"
    ), Grade = c(31L, 28L, 28L, 28L, 31L, 28L, 28L, 28L, 31L, 31L, 
    31L, 60L, 31L, 28L)), class = "data.frame", row.names = c(NA, 
    -14L))