Search code examples
rdateaveragedays

How can I get the average number of days between changes in a variable over a period of time in R?


I have a dataset that has date and another variable (bank rate). Here is a snippet of the data:

![enter image description here

I want to calculate the average number of days between each consecutive change in Bank rate. For example to get an output as such:

enter image description here

Essentially, I am trying to calculate the average number of days a rate remains for before it changes.

I am able to do the usual difftime() function, However I need it to specifically only calculate the difference when there are changes in the rate, and then average it out. I am new to R and unable to figure out how to go about this


Solution

  • I have a made a random sequence of dates in the timeframe as above and have used bank_rate from above and put them in a DF.

    This DF is ordered for date. Data which do not show any change in bank_rate are then removed by filter. (see consecutive bank_rates of 2). A new variable days_from_before is created which calculates the number of days of consecutive dates.

    The average is calculated as the mean from days_from_before.

    library(dplyr)
    
    set.seed(123)
    date <- sample(seq(as.Date("2018/02/07"), as.Date("2023/01/15"), by = "day"), 14)
    bank_rate <- c(1.5, 1.5, rep(2, 6), 0.5, 1.25, 4.5, 4.5, 4.75, 4.75)
    
    df <- data.frame(date, bank_rate)
    
    df
    #>          date bank_rate
    #> 1  2019-03-28      1.50
    #> 2  2019-05-15      1.50
    #> 3  2018-08-04      2.00
    #> 4  2019-07-17      2.00
    #> 5  2018-08-20      2.00
    #> 6  2020-09-01      2.00
    #> 7  2021-03-24      2.00
    #> 8  2021-09-21      2.00
    #> 9  2021-07-13      0.50
    #> 10 2021-07-28      1.25
    #> 11 2020-12-10      4.50
    #> 12 2021-12-05      4.50
    #> 13 2019-12-03      4.75
    #> 14 2019-10-01      4.75
    
    ddf <- df |>
      arrange(date) |>
      filter(bank_rate != dplyr::lag(bank_rate, default = 0)) |> 
      mutate(
        days_from_before = as.numeric(difftime(date, dplyr::lag(date))),
        days_from_before = ifelse(is.na(days_from_before), 0, days_from_before)
      )
    
    ddf
    #>          date bank_rate days_from_before
    #> 1  2018-08-04      2.00                0
    #> 2  2019-03-28      1.50              236
    #> 3  2019-07-17      2.00              111
    #> 4  2019-10-01      4.75               76
    #> 5  2020-09-01      2.00              336
    #> 6  2020-12-10      4.50              100
    #> 7  2021-03-24      2.00              104
    #> 8  2021-07-13      0.50              111
    #> 9  2021-07-28      1.25               15
    #> 10 2021-09-21      2.00               55
    #> 11 2021-12-05      4.50               75
    
    mean(ddf$days_from_before)
    #> [1] 110.8182