Search code examples
rdatebinning

How to bin date by month


I have the following data:

  df <- data.frame(dt1 = c("2019-05-02", "2019-01-02", "2019-06-02"), 
                   dt2 =   c("2019-08-30", "2019-04-24", "2019-12-06") )
> df
         dt1        dt2
1 2019-05-02 2019-08-30
2 2019-01-02 2019-04-24
3 2019-06-02 2019-12-06

Here is what I want to do:

i) I want create factors by binning, for example, for the first date, the dates binned as 2019-07-31, 2019-06-30, 2019-05-31, so essentially binning by dt2.
ii) I want to count the total number of dates in each bin.

The expected output is:

        dt1        dt2      val_count
1 2019-05-02 2019-08-30         3
2 2019-01-02 2019-04-24         3 
3 2019-06-02 2019-12-06         6 

I found this post relevant.

Note: I do not want take difference between months of two dates.

Thank you for suggestions.


Solution

  • It's pretty messy but if you want to count how many last date of the months are in between dt1 and dt2, you may try

    library(lubridate)
    library(dplyr)
    
    fd <- paste0(lubridate::year(min(df$dt1, df$dt2)), "-02-01") %>% as.Date()
    
    ld <- paste0(lubridate::year(max(df$dt1, df$dt2))+1, "-01-01") %>% as.Date()
    x <- seq.Date(fd, ld, by = "month") - 1
    df %>%
      rowwise() %>%
      mutate(val_count = length(x[dt1 < x & x < dt2]))
    
      dt1        dt2        val_count
      <chr>      <chr>          <int>
    1 2019-05-02 2019-08-30         3
    2 2019-01-02 2019-04-24         3
    3 2019-06-02 2019-12-06         6
    

    Choice of < or <= depends on your purpose.

    To get total days between dt1 and dt2,

    df %>%
      rowwise() %>%
      mutate(val_count = length(x[dt1 < x & x < dt2])) %>%
      mutate(dd = as.Date(dt2) - as.Date(dt1))
    
      dt1        dt2        val_count dd      
      <chr>      <chr>          <int> <drtn>  
    1 2019-05-02 2019-08-30         3 120 days
    2 2019-01-02 2019-04-24         3 112 days
    3 2019-06-02 2019-12-06         6 187 days
    

    Add

    df %>%
      rowwise() %>%
      mutate(val_count = length(x[dt1 < x & x < dt2]),
             val_count = ifelse(val_count == 0, 1, val_count)) %>%
      mutate(dd = as.Date(dt2) - as.Date(dt1))
    
      dt1        dt2        val_count dd      
      <chr>      <chr>          <dbl> <drtn>  
    1 2019-05-02 2019-08-30         3 120 days
    2 2019-01-02 2019-04-24         3 112 days
    3 2019-06-02 2019-12-06         6 187 days
    4 2019-06-01 2019-06-02         1   1 days