Search code examples
rdatethreshold

Find start and end date of when value exceeds a certain threshold for at least X time periods


I have three columns, one of which contains a statistic, the other a critical value, and the final one the date. I would like to know how to find the start and end date when the threshold of the critical value is exceeded for at least n periods. A simple example of the data is as follows:

Date        Statistic  Critical Value
2003-01-01  1.1        0.80
2003-01-08  1.5        0.90
2003-01-15  2.1        0.91
2003-01-22  0.5        0.95
2003-01-29  1.4        0.98
2003-02-05  1.3        1.00
2003-02-12  0.8        1.10

I would expect the output with a minimum period of 3 to be:

Start      End
2003-01-01 2003-01-15

However if the minimum period was 2 rather than 3 I would expect:

Start      End
2003-01-01 2003-01-15
2003-01-29 2003-02-05

Solution

  • A very self explanatory solution with data.table would be: (There might be more compact solutions)

    require(data.table)
    setDT(DT)
    DT[,exceeds:=Statistic>CriticalValue]
    DT[,mydiff := c(0,diff(exceeds))]
    DT[mydiff<0, mydiff := 0]
    DT[,run := cumsum(mydiff)]
    DT[c(exceeds),.(start=Date[1], end = Date[.N], length = .N), by=run]
    

    Results in:

       run      start        end length
    1:   0 2003-01-01 2003-01-15      3
    2:   1 2003-01-29 2003-02-05      2
    

    So if you only want runs with length>2 use

    DT_agg <- DT[c(exceeds),.(start=Date[1], end = Date[.N], length = .N), by=run]
    DT_agg[length>2]
    
       run      start        end length
    1:   0 2003-01-01 2003-01-15      3