Search code examples
rtidyverselubridatecumulative-sum

Calculate time passed within group and flag once a certain (cumulative) time period has passed


Consider the following data:

df <- structure(list(date = structure(c(10904, 10613, 10801, 10849, 
10740, 10680, 10780, 10909, 10750, 10814), class = "Date"), group = c(1L, 
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-10L))

which gives:

         date group
1  1999-11-09     1
2  1999-01-22     2
3  1999-07-29     1
4  1999-09-15     2
5  1999-05-29     1
6  1999-03-30     1
7  1999-07-08     1
8  1999-11-14     2
9  1999-06-08     2
10 1999-08-11     2

I now want to calculate

  • a) the months that have been passed between two neighbouring dates per group (I know how to do this)
  • b) flag rows when a certain time period (3 months) has passed and if it has passed, i would kind of reset and again look when 3 months have been passed from that date.

So for a) I'm doing this:

library(tidyverse)
library(lubridate)
df %>%
  group_by(group) %>%
  arrange(group, date) %>%
  mutate(months_passed = time_length(interval(lag(date), date), "months"))

which gives:

# A tibble: 10 x 3
# Groups:   group [2]
   date       group months_passed
   <date>     <int>         <dbl>
 1 1999-03-30     1        NA    
 2 1999-05-29     1         1.97 
 3 1999-07-08     1         1.3  
 4 1999-07-29     1         0.677
 5 1999-11-09     1         3.35 
 6 1999-01-22     2        NA    
 7 1999-06-08     2         4.55 
 8 1999-08-11     2         2.10 
 9 1999-09-15     2         1.13 
10 1999-11-14     2         1.97

But for b) I'm lost. What I want to do is:

  • Look at each group separately.
  • Calculate the months_passed between row 1 and row 2 (here: 1.97 months for group 1)
  • If it is < 3 months, continue with the next row and calculate the time difference between the already passed months and the current time difference (here: 1.97 + 1.3 months).
  • Now that the difference is bigger >= 3 months, I want to flag row 3.
  • Now I reset the cumulative time difference and again start calculating the difference with the next row (here: 0.67 months) and so on.

Expected outcome would be:

# A tibble: 10 x 4
# Groups:   group [2]
   date       group months_passed time_flag
   <date>     <int>         <dbl>     <int>
 1 1999-03-30     1        NA             0
 2 1999-05-29     1         1.97          0
 3 1999-07-08     1         1.3           1
 4 1999-07-29     1         0.677         0
 5 1999-11-09     1         3.35          1
 6 1999-01-22     2        NA             0
 7 1999-06-08     2         4.55          1
 8 1999-08-11     2         2.10          0
 9 1999-09-15     2         1.13          1
10 1999-11-14     2         1.97          0

Any ideas?


Solution

  • BTW, with some better searching (now that I know which terms to search for), I was able to bring up this one completely working without additional functions:

    dplyr / R cumulative sum with reset

    df %>%
      group_by(group) %>%
      arrange(group, date) %>%
      mutate(months_passed = time_length(interval(lag(date), date), "months"),
             months_passed = if_else(is.na(months_passed), 0, months_passed),
             time_flag = if_else(accumulate(months_passed, ~if_else(.x >= 3, .y, .x + .y)) >= 3, 1, 0))
    
    # A tibble: 10 x 4
    # Groups:   group [2]
       date       group months_passed time_flag
       <date>     <int>         <dbl>     <dbl>
     1 1999-03-30     1         0             0
     2 1999-05-29     1         1.97          0
     3 1999-07-08     1         1.3           1
     4 1999-07-29     1         0.677         0
     5 1999-11-09     1         3.35          1
     6 1999-01-22     2         0             0
     7 1999-06-08     2         4.55          1
     8 1999-08-11     2         2.10          0
     9 1999-09-15     2         1.13          1
    10 1999-11-14     2         1.97          0