I have been using TA-Lib to calculate EMAs but each time I add a new number to the array, TA-Lib performs the calculation from scratch again.
I am doing analysis on a rather large set of data (> 1M rows) and this is quite slow.
What would be the fastest way to calculate the new EMA when a new value is added?
Let x
be the vector of length n
containing your samples, that is x[0]
, ..., x[n-1]
. Let y
be the vector containing the EMA. Then y
is given by the equation:
y[k] = y[k-1] * a + x[k] * (1-a)
Where a
is the EMA parameter, which is between 0 and 1. The closer to 1, the smoother the curve.
Therefore, to compute the EMA you just need:
a = 0.9
y[0] = x[0]
for k in range(1, n):
y[k] = y[k-1]*a + x[k]*(1-a)
Then if you get another sample, that is x[n]
, you can compute the EMA y[n]
without doing the full calculation with:
y[n] = y[n-1]*a + x[n]*(1-a)
This is pseudocode, so if you use lists, it should be something like this:
y.append(y[-1]*a + x[-1]*(1-a))
Edit:
If you really want to improve speed for the computation of the EMA (the whole EMA at a time), you can use numba and numpy:
import numpy as np
from numba import njit
from timeit import timeit
n=1000000
x_np = np.random.randn(n) # your data
x_list = list(x_np)
a = 0.9
def ema_list(x, a):
y = [x[0]]
for k in range(1, n):
y.append(y[-1]*a + x[k]*(1-a))
return y
@njit("float64[:](float64[:], float64)")
def ema_np(x, a):
y = np.empty_like(x)
y[0] = x[0]
for k in range(1, n):
y[k] = y[k-1]*a + x[k]*(1-a)
return y
print(timeit(lambda: ema_list(x_list, a), number=1)) # 0.7080 seconds
print(timeit(lambda: ema_np(x_np, a), number=1)) # 0.008015 seconds
The list implementation takes about 708 ms whereas the numba and numpy takes 8 ms, that is about 88 times faster. The numpy implementation without numba takes similar time to the list implementation.