Search code examples
rdplyrlubridate

Summarize Dates


I want to summarize dates from Monday to Monday but on Saturdays and Sundays accumulate on Friday

## A tibble: 10 x 2
#date        value
#<date>      <dbl>
#  1 2020-01-02  1     --- Thurs
#2 2020-01-03 1        --- Fri
#3 2020-01-04  2       --- Sat
#4 2020-01-06  1       --- Mon
#5 2020-01-07  1       --- Tues
#6 2020-01-08 1        --- Wed
#7 2020-01-09  1       --- Thurs
#8 2020-01-10  1       --- Fri
#9 2020-01-11  2       --- Sat
#10 2020-01-13 1       --- Mon

This is what i want

#date        value
#<date>      <dbl>
#  1 2020-01-02  1     --- Thurs
#2 2020-01-03 3        --- Fri
#4 2020-01-06  1       --- Mon
#5 2020-01-07  1       --- Tues
#6 2020-01-08 1        --- Wed
#7 2020-01-09  1       --- Thurs
#8 2020-01-10  3       --- Fri
#10 2020-01-13 1       --- Mon

Data

dd = structure(list(date = structure(c(18263, 18264, 18265, 18267, 18268, 18269, 18270, 18271, 18272, 18274), class = "Date"), value = c(1, 1, 2, 1, 1, 1, 1, 1, 2, 1)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

Solution

  • We can use the weekdays to replace the 'date' that are on weekends to NA, then fill the NAs with previous non-NA element (assuming the 'date' is already arranged, and then do a group_by sum

    library(dplyr)
    library(tidyr)
    dd %>% 
        mutate(Weekday = weekdays(date),
              date = replace(date, Weekday %in% c("Saturday", "Sunday"), NA)) %>% 
        fill(date) %>% 
        group_by(date) %>% 
        summarise(value = sum(value), .groups = 'drop')
    

    -output

    # A tibble: 8 x 2
    #  date       value
    #  <date>     <dbl>
    #1 2020-01-02     1
    #2 2020-01-03     3
    #3 2020-01-06     1
    #4 2020-01-07     1
    #5 2020-01-08     1
    #6 2020-01-09     1
    #7 2020-01-10     3
    #8 2020-01-13     1