Search code examples
rdatetimelaglead

counting consecutive dates R


How to check in a column whether dates are one off from each other


Solution

  • With tidyverse you can group_by both the id as well as a second id id2 that would group the rows together that are separated by a difference of one day. Then, the number of consecutive days column would include the difference between the last date and the first date (or zero if not the first date).

    library(tidyverse)
    
    data %>%
      mutate(date = as.Date(date, format = "%m-%d-%Y")) %>%
      arrange(id, date) %>%
      group_by(id) %>%
      group_by(id2 = cumsum(c(T, diff(date) > 1)), .add = T) %>%
      mutate(num_con_days = ifelse(date == first(date), last(date) - date + 1, 0)) %>%
      ungroup %>%
      select(-id2)
    

    Output

         day    id date       num_con_days
       <dbl> <dbl> <date>            <dbl>
     1     1    10 2021-01-01            1
     2     3    10 2021-01-03            4
     3     4    10 2021-01-04            0
     4     5    10 2021-01-05            0
     5     6    10 2021-01-06            0
     6     1    24 2021-01-01            2
     7     2    24 2021-01-02            0
     8     4    24 2021-01-04            3
     9     5    24 2021-01-05            0
    10     6    24 2021-01-06            0
    

    Edit: Using your other example, with renamed column names, you have the following data.frame:

       id        day num_consecutive_days
    1   1 2021-01-02                    1
    2   2 2021-01-02                    1
    3   2 2021-01-05                    2
    4   2 2021-01-06                    0
    5   2 2021-01-12                    1
    6   3 2021-01-01                    2
    7   3 2021-01-02                    0
    8   3 2021-01-04                    1
    9   3 2021-01-11                    1
    10  4 2021-01-01                    1
    

    Here, your day is in Year-Month-Day format (so in converting to a Date, you don't need to provide a separate format).

    Also, you will need to make sure your column names match and are consistent for day. See below similar code - this should be the same as your desired output.

    df %>%
      mutate(day = as.Date(day)) %>%
      arrange(id, day) %>%
      group_by(id) %>%
      group_by(id2 = cumsum(c(T, diff(day) > 1)), .add = T) %>%
      mutate(num_con_days = ifelse(day == first(day), last(day) - day + 1, 0)) %>%
      ungroup %>%
      select(-id2)
    

    Output

       id        day num_consecutive_days
    1   1 2021-01-02                    1
    2   2 2021-01-02                    1
    3   2 2021-01-05                    2
    4   2 2021-01-06                    0
    5   2 2021-01-12                    1
    6   3 2021-01-01                    2
    7   3 2021-01-02                    0
    8   3 2021-01-04                    1
    9   3 2021-01-11                    1
    10  4 2021-01-01                    1