Search code examples
rpercentile

Running percentile value for each calendar day from multi-year data in R


I need to calculate the 30-day running (window) 90th percentile maximum temperature value for each calendar day from multi-year data. For example, to calculate the 90th percentile value on Jan 1st, I have to choose a 30-day window centered on Jan 1st, i.e., data from December 16 to January 15 for all 42 years. So, I would have 1260 (30*42) data points for each day. I need the value for 366 days. I have 42-year daily datasets from 1980 to 2022, which look like this:

date    tmax    tmin
1981-01-01  19.2    5.4
1981-01-02  18.2    5
1981-01-03  16.1    3.8
1981-01-04  17.2    4.4
1981-01-05  15.7    2.4
1981-01-06  15.6    5.4
1981-01-07  11.2    4.1
1981-01-08  14.8    -1
1981-01-09  15  0.8
1981-01-10  16.2    -0.4

.........................
.........................
.........................
2022-12-25  17.4    4.4
2022-12-26  16.5    4.1
2022-12-27  17  5.4
2022-12-28  15.2    3.6
2022-12-29  8.1 7.7
2022-12-30  13.5    6
2022-12-31  14.8    4.5

How can I do this in R? Initially, I thought it would be simple like this.

temp_data <- read.csv("temperature.csv")

#as the date and tmax data are being read as characters by R
temp_data$tmax <- as.numeric(temp_data$tmax)
temp_data$date <- as.Date(temp_data$date, "%Y-%m-%d")
#Create a day of year variable for the day of the year
temp_data$doy <- as.numeric(format(temp_data$date,"%j"))

#load libraries
library(dplyr)
library(zoo)

temp_data_90th <- temp_data %>% 
  group_by(doy) %>% 
  summarize(rolling_90th = rollapply(tmax, width = 30, FUN = quantile, prob = 0.9, align = "center", na.rm=T))

But I don't think it gave the correct result since temp_data_90th has 4,470 rows with 13 data for each day of year.

Please can you suggest where I am doing wrong? Thank you in advance for your support.


Solution

  • To illustrate this we will need reproducible data so use DF shown reproducibly in the Note at the end.

    Calculate yday which is the day of the year for each row of DF. Then for each possible yday (0:365) get value in all rows whose yday is within 15 back to 14 forward of that yday modulo 366 and apply quantile to those values giving q90.

    No packages are used.

    yday <- as.POSIXlt(DF$date)$yday
    q90 <- sapply(0:365, function(x) 
      quantile(DF$value[yday %in% (seq(x-15, x+14) %% 366)], prob = 0.9, na.rm = TRUE))
    

    With rollapply it is slightly shorter. Using yday from above we have:

    library(zoo)
    q90 <- rollapply(seq(-15, 365 + 14) %% 366, 30, function(x)
      quantile(DF$value[yday %in% x], prob = 0.9, na.rm = TRUE))
    

    Note

    d <- seq(as.Date("2000-01-01"), as.Date("2022-12-31"), "day")
    DF <- data.frame(date = d, value = seq_along(d))