Search code examples
rtimextsdurationthreshold

Calculation of the maximum duration over threshold in R (timeseries)


I have a xts-timeseries temperature data in 5 min resolution.

head(dataset)
Time                Temp
2016-04-26 10:00:00 6.877
2016-04-26 10:05:00 6.877
2016-04-26 10:10:00 6.978
2016-04-26 10:15:00 6.978
2016-04-26 10:20:00 6.978
  1. I want to calculate the longest duration the temperature exceeds a certain threshold. (let's say 20 °C)
  2. I want to calculate all the periods with their duration the temperature exceeds a certain threshold.
  3. I create a data.frame from my xts-data:

    df=data.frame(Time=index(dataset),coredata(dataset))
    head(df)
    Time                  Temp
    1 2016-04-26 10:00:00 6.877
    2 2016-04-26 10:05:00 6.877
    3 2016-04-26 10:10:00 6.978
    4 2016-04-26 10:15:00 6.978
    5 2016-04-26 10:20:00 6.978
    6 2016-04-26 10:25:00 7.079
    
  4. then I create a subset with only the data that exceeds the threshold:

    sub=(subset(x=df,subset = df$Temp>20))
    head(sub)
                Time         Temp
    7514 2016-05-22 12:05:00 20.043
    7515 2016-05-22 12:10:00 20.234
    7516 2016-05-22 12:15:00 20.329
    7517 2016-05-22 12:20:00 20.424
    7518 2016-05-22 12:25:00 20.615
    7519 2016-05-22 12:30:00 20.805
    

    But now im having trouble to calculate the duration of the event the temperature exceeds the threshold. I dont know how to identify a connected period and calculate their duration?

I would be happy if you have a solution for this question (it's my first thread so please excuse minor mistakes) If you need more information on my data, feel free to ask.


Solution

  • This may work. I take as example this data:

    df <- structure(list(Time = structure(c(1463911500, 1463911800, 1463912100, 
    1463912400, 1463912700, 1463913000), class = c("POSIXct", "POSIXt"
    ), tzone = ""), Temp = c(20.043, 20.234, 6.329, 20.424, 20.615, 
    20.805)), row.names = c(NA, -6L), class = "data.frame")
    
    > df
                     Time   Temp
    1 2016-05-22 12:05:00 20.043
    2 2016-05-22 12:10:00 20.234
    3 2016-05-22 12:15:00  6.329
    4 2016-05-22 12:20:00 20.424
    5 2016-05-22 12:25:00 20.615
    6 2016-05-22 12:30:00 20.805
    
    library(dplyr)
    df %>% 
      # add id for different periods/events
      mutate(tmp_Temp = Temp > 20, id = rleid(tmp_Temp)) %>% 
      # keep only periods with high temperature
      filter(tmp_Temp) %>%
      # for each period/event, get its duration
      group_by(id) %>%
      summarise(event_duration = difftime(last(Time), first(Time)))
    
    
         id event_duration
      <int> <time>        
    1     1  5 mins       
    2     3 10 mins