I have three columns, one of which contains a statistic, the other a critical value, and the final one the date. I would like to know how to find the start and end date when the threshold of the critical value is exceeded for at least n periods. A simple example of the data is as follows:
Date Statistic Critical Value
2003-01-01 1.1 0.80
2003-01-08 1.5 0.90
2003-01-15 2.1 0.91
2003-01-22 0.5 0.95
2003-01-29 1.4 0.98
2003-02-05 1.3 1.00
2003-02-12 0.8 1.10
I would expect the output with a minimum period of 3 to be:
Start End
2003-01-01 2003-01-15
However if the minimum period was 2 rather than 3 I would expect:
Start End
2003-01-01 2003-01-15
2003-01-29 2003-02-05
A very self explanatory solution with data.table
would be:
(There might be more compact solutions)
require(data.table)
setDT(DT)
DT[,exceeds:=Statistic>CriticalValue]
DT[,mydiff := c(0,diff(exceeds))]
DT[mydiff<0, mydiff := 0]
DT[,run := cumsum(mydiff)]
DT[c(exceeds),.(start=Date[1], end = Date[.N], length = .N), by=run]
Results in:
run start end length
1: 0 2003-01-01 2003-01-15 3
2: 1 2003-01-29 2003-02-05 2
So if you only want runs with length>2
use
DT_agg <- DT[c(exceeds),.(start=Date[1], end = Date[.N], length = .N), by=run]
DT_agg[length>2]
run start end length
1: 0 2003-01-01 2003-01-15 3