Search code examples
rloopsdatesumjulian-date

Loop through consecutive time periods


I want to create a loop that extracts data from a time period before then moving the time period by one day and repeating the analysis. The main problem i'm having is how to do this with a time period rather than just one day. I have introduced a julian day column to try and make it easier (ie now its just a sequence of numbers rather than dates) however I still can't quite figure it out.

here is some example data:

           Date   Nor_MM Julianday
6441 2090-06-01 22.58582       152
6442 2090-06-02 20.43654       153
6443 2090-06-03 17.37954       154
6444 2090-06-04 18.12772       155
6445 2090-06-05 19.53053       156
6446 2090-06-06 23.25154       157
6447 2090-06-07 24.52292       158
6448 2090-06-08 24.83597       159
6449 2090-06-09 24.67915       160
6450 2090-06-10 24.22688       161
structure(list(Date = structure(c(2090-01, 43982, 43983, 43984, 
43985, 43986, 43987, 43988, 43989, 43990), class = "Date"), Nor_MM = c(22.58582103, 
20.43654256, 17.37954095, 18.12772066, 19.53053131, 23.25153522, 
24.52291687, 24.83597434, 24.67915157, 24.22688304), Julianday = c(152, 
153, 154, 155, 156, 157, 158, 159, 160, 161)), row.names = 6441:6450, class = "data.frame")

I want the total number of days within a 16 day period which exceed 20 degrees. (ie the total sum of days between 2090-06-01 (julian day 152) and 2090-06-16 (julian day 168) which exceed 20 degrees. This, I have calculated using the code below.

df1<-filter_time(df, time_formula = '2090-06-01' ~ '2090-06-16')
sum(df1$Nor_MM>=20)

The problem is that I want this number for all possible 16 day periods within the data set (of which there are 75). I cannot figure out how to loop something through a time period whereby the period moves by 1 day with each iteration. I need code which completes the above, before then moving the time period along by 1 day and completing it for 2090-06-02 to 2090-06-17. And repeats for all periods in the dataframe.

Any help with this would be amazing, thankyou!


Solution

  • rollapplyr (note r on the end which means right aligned) performs a rolling operation (here sum) using the ith width, i.e. number of positions to sum over, for the ith data component. The arguments are the data, the widths to sum over and the function, i.e. sum.

    findInterval(Date-16, Date) finds the position of the Date 16 days back or the highest date less than that if no such date. Subtracting that from the position gives the number of days to consider.

    If you are using tidyverse optionally replace transform with mutate.

    library(zoo)
    transform(df, 
      ndays = rollapplyr(data = Nor_MM >= 20, 
                         width = seq_along(Date) - findInterval(Date - 16, Date), 
                         FUN = sum))
    

    giving

               Date   Nor_MM Julianday ndays
    6441 1975-09-21 22.58582       152     1
    6442 2090-06-02 20.43654       153     1
    6443 2090-06-03 17.37954       154     1
    6444 2090-06-04 18.12772       155     1
    6445 2090-06-05 19.53053       156     1
    6446 2090-06-06 23.25154       157     2
    6447 2090-06-07 24.52292       158     3
    6448 2090-06-08 24.83597       159     4
    6449 2090-06-09 24.67915       160     5
    6450 2090-06-10 24.22688       161     6
    

    In the sample data there is a gap between the first date and the second date but if in your real data there are no gaps then it is even easier since the width can be specified as 16 and partial=TRUE can be used to specify that it should use whatever number of elements are available when fewer are available.

    # if no gaps in dates
    transform(df, ndays = rollapplyr(Nor_MM >= 20, 16, sum, partial = TRUE))