Search code examples
rdataframelubridate

Periodic sum over time conditional on start and end date


I try to construct for each day, week, and month a sum of column x. If the specific day, week or month is between the start and end dates, I want to include the number x and add them up.

I constructed this sample data frame:

library(data.table)
library(lubridate)

df <- data.frame(x=c(13,32,37,21,9,43,12,28),
                 start=c('2018-06-12','2019-02-12','2018-12-30','2020-02-05','2019-09-29','2017-05-19','2019-06-13','2020-04-12'), 
                 end=c('2018-09-13','2019-03-19','2020-01-10','2020-03-17','2020-10-10','2020-01-02','2019-07-19','2021-06-01'))

#convert columns as date
df$start <- as.Date(df$start,"%Y-%m-%d")
df$end <- as.Date(df$end,"%Y-%m-%d")

I tried to make a for loop over each day, to sum up column x for each specific period but I wasn't able to do it.

#for loop over days
days <- seq(from=as.Date("2017-01-01"), to=as.Date("2021-07-31"), by="days")
for (i in seq_along(days)){
  print(sum(df$x))}

Thank you very much for your help :)


Solution

  • You can expand the start and end date for each row and create a new row with it. For each date you can sum the x values. We use complete to fill the missing dates if they exist.

    library(tidyverse)
    
    df %>%
      mutate(dates = map2(start, end, seq, by = 'days')) %>%
      unnest(dates) %>%
      group_by(dates) %>%
      summarise(x = sum(x)) %>%
      complete(dates = seq(min(dates), max(dates), by = 'days'), fill = list(x = 0)) 
    
    #   dates          x
    #   <date>     <dbl>
    # 1 2017-05-19    43
    # 2 2017-05-20    43
    # 3 2017-05-21    43
    # 4 2017-05-22    43
    # 5 2017-05-23    43
    # 6 2017-05-24    43
    # 7 2017-05-25    43
    # 8 2017-05-26    43
    # 9 2017-05-27    43
    #10 2017-05-28    43
    # … with 1,465 more rows