In my trading application I have "live ticks" of stock prices. I need to maintain SMA. Let's assume I want SMA of 20 candles, where duration of each candle is 10 seconds. This means that
Every 10 seconds I have "checkpoint" where:
Every tick:
So on any tick I need to "recalculate" SMA. In most cases only price of the last candle is changed (cause we using last price). Once per 10 seconds I need a little bit more extra work - I need to "forget" average of the outdated candle, and "store" average of "just created" candle.
Can you suggest how to implement this with lowest latency? Low latency is primary requirement.
I'm not sure whether this is the approach you're looking for but here is the pseudocode for very quick SMAs.
Simple Moving Average:
I assume that your data is coming in the form of some stream and stored in continuous memory location (at least with continuously mappable addresses)
x[1] to x[2000] contain the first 2000 data points
// they don't have to be a real array, could just be a function which manages
// a 'circular' array and maps the 'locations' to real locations in memory.
// Either case, it's small enough to be fully in the cache hopefully
//
// Subsequent prices will be accessible in 'locations x[2001], etc.
// Here we are looking to calculate the MA over the latest 2000 ticks
MA2000(1,2000) = (x[1] + x[2] + ... + x[2000]) / 2000 // Usual average
// Done only once
MA2000(2,2001) = MA2000(1,2000) * 2000 + x[2001] - x[1]
MA2000(2,2001) /= 2000
That way with two additions and one multiplication ( with 1/2000 ) you can generate subsequent moving averages for the new ticks.
Exponential moving average: That is a decent alternative, as mentioned above:
// For an N-day EMA
Alpha = 2 / (N+1) // one time calculation
EMA(new) = Alpha * NewTickPrice + (1-Alpha) * EMA(old)
Here it's not really an N-day moving average. It's just a weighted moving average with ~87% weightage to the last N-days, so almost N-days is more like it.
Note on compiler optimizations:
Do note that turning on SSE or AVX options if available will enable massive speedup of these algorithms as multiple calculations can be churned out in a single CPU cycle.