Search code examples
ralgorithmoptimizationfinancequantitative-finance

Algorithmically detecting jumps in a time-series


I have about 50 datasets that include all trades within a timeframe of 30 days for about 10 pairs on 5 exchanges. All pairs are of the same asset class, meaning they are strongly correlated and expect to have similar properties, but are on different scales. An example of this data would be

set.seed(1)

n <- 1000
dates <- seq(as.POSIXct("2019-08-05 00:00:00", tz="UTC"), as.POSIXct("2019-08-05 23:59:00", tz="UTC"), by="1 min")
x <- data.frame("t" = sort(sample(dates, 1000)),"p" = cumsum(sample(c(-1, 1), n, TRUE)))

Plot example

Roughly, I need to identify the relevant local minima and maxima, which happen daily. The yellow marks are my points of interest. Unlike this example, there is usually only one such point per day and I consider each day separately. However, it is hard to filter out noise from my actual points of interest.

My actual goal is to find the exact point, at which the pair started to make a jump and the exact point, at which the jump is over. This needs to be as accurate as possible, as I want to observe which asset moved first and which asset followed at which point in time (as said, they are highly correlated). Between two extreme values, I want to minimize the distance and maximize the relative/absolute change, as my points of interest are usually close to each other and their difference is quite large.

I already looked at other questions like Finding local maxima and minima and Algorithm to locate local maxima and also this algorithm that has the same goal. However, my dataset is extremely noisy. I already reduced the dataset to 5-minute intervals, however, this has led to omitting the relevant points in the functions to identify local minima & maxima. Therefore, this was a not good solution given my goal.

How can I achieve my goal with a quite accurate algorithm? Manually skimming through all the time-series is not an option, since this would require me to evaluate 50 * 30 time-series manually, which is too time-consuming. I'm really puzzled and trying to find a suitable solution for a week.

If more code snippets are demanded, I'm happy to share, however they didn't give me meaningful results, which would be opposed to the idea of providing a minimum working example, therefore I decided to leave them out for now.

EDIT: First off, I updated the plot and added timestamps to the dataset to give you an idea (the actual resolution). Ideally, the algorithm would detect both jumps on the left. The inner two dots because they're closer together and jump without interception, and the outer dots because they're more extreme in values. In fact, this maybe answers the question whether the algorithm is allowed to look into the future. Yes, if there's another local extrema in the range of, say, 30 observations (or 30 minutes), then ignore the intermediate local extrema. In my data, jumps have been from 2% - ~ 15%, such that a jump needs to be at least 2% to be considered. And only if a threshold of 15 (this might be adaptable) consecutive steps in the same direction before / after the peaks and valleys is reached.

A very naive approach was to subset the data around the global minimum and maximum of a day. In most cases, this has denoised data and worked as an indicator. However, this is not robust when the global extrema are not in the range of the jump.

Hope this clarifies why this isn't a statistical question (there are some tests to determine whether a jump has happened, but not for jump arrival time afaik).


In case anyone wants a real example: this is a corresponding graph, this is the raw data of the relevant period and this is the reduced dataset.



Solution

  • Perhaps as a starting point, look at function streaks in package PMwR (which I maintain). A streak is defined as a move of a specified size that is uninterrupted by a countermove of the same size. The function works with returns, not differences, so I add 100 to your data.

    For instance:

    set.seed(1)
    n <- 1000
    x <- 100 + cumsum(sample(c(-1, 1), n, TRUE))
    
    plot(x, type = "l")
    s <- streaks(x, up = 0.12, down = -0.12)
    abline(v = s[, 1])
    abline(v = s[, 2])
    

    The vertical lines show the starts and ends of streaks.

    Streaks

    Perhaps you can then filter the identified streaks by required criteria such as length. Or you may play around with different thresholds for up and down moves (though this is not really recommended in the current implementation, but perhaps the results are good enough). For instance, up streaks might look as follows. A green vertical shows the start of a streak; a red line shows its end.

    plot(x, type = "l")
    s <- streaks(x, up = 0.12, down = -0.05)
    s <- s[!is.na(s$state) & s$state == "up", ]
    abline(v = s[, 1], col = "green")
    abline(v = s[, 2], col = "red")
    

    Up streaks