I have a dataframe with this format :
group Time Grade
1 2014-03-03 09:15:00 31
2 2010-10-20 17:22:00 28
2 2010-10-20 20:08:00 28
2 2010-11-28 08:43:00 28
3 2019-12-06 14:41:00 31
3 2019-12-06 14:41:00 28
3 2020-01-07 11:01:00 28
3 2020-02-16 17:35:00 28
4 2019-07-01 07:10:00 31
4 2019-08-01 08:56:00 31
4 2019-09-01 12:03:00 31
5 2016-08-19 17:12:00 60
5 2016-08-19 17:10:00 31
6 2021-11-03 10:41:00 28
I want to sum the value based on Time and each group. If group have multiple row data.Can devide some situation:
2 2010-10-20 20:08:00 28
2 2010-11-28 08:43:00 28
5 2016-08-19 17:12:00 60
condition2. In group have same date and same time, choose min of Grade.
3 2019-12-06 14:41:00 28
3 2020-01-07 11:01:00 28
3 2020-02-16 17:35:00 28
Final, will keep rows like:
group Time Grade
1 2014-03-03 09:15:00 31
2 2010-10-20 20:08:00 28
2 2010-11-28 08:43:00 28
3 2019-12-06 14:41:00 28
3 2020-01-07 11:01:00 28
3 2020-02-16 17:35:00 28
4 2019-07-01 07:10:00 31
4 2019-08-01 08:56:00 31
4 2019-09-01 12:03:00 31
5 2016-08-19 17:12:00 60
6 2021-11-03 10:41:00 28
This is what the result should look like.
group Total_G
1 31
2 56
3 84
4 93
5 60
6 28
We extract the Date
from Time
, group it along with group
, arrange
by 'grade', extract the row where the 'Time' is max, then do a grouping by 'group' and get the sum
of 'Grade'
library(dplyr)
df1 %>%
mutate(Time = as.POSIXct(Time)) %>%
group_by(group, Date = as.Date(Time)) %>%
arrange(Grade, .by_group = TRUE) %>%
slice_max(Time, with_ties = FALSE) %>%
group_by(group) %>%
summarise(Grade = sum(Grade, na.rm = TRUE), .groups = 'drop')
-output
# A tibble: 6 × 2
group Grade
<int> <int>
1 1 31
2 2 56
3 3 84
4 4 93
5 5 60
6 6 28
df1 <- structure(list(group = c(1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L,
4L, 4L, 5L, 5L, 6L), Time = c("2014-03-03 09:15:00", "2010-10-20 17:22:00",
"2010-10-20 20:08:00", "2010-11-28 08:43:00", "2019-12-06 14:41:00",
"2019-12-06 14:41:00", "2020-01-07 11:01:00", "2020-02-16 17:35:00",
"2019-07-01 07:10:00", "2019-08-01 08:56:00", "2019-09-01 12:03:00",
"2016-08-19 17:12:00", "2016-08-19 17:10:00", "2021-11-03 10:41:00"
), Grade = c(31L, 28L, 28L, 28L, 31L, 28L, 28L, 28L, 31L, 31L,
31L, 60L, 31L, 28L)), class = "data.frame", row.names = c(NA,
-14L))