In Python, what is the faster way to calculate an EMA by reusing the previous calculations?

I have been using TA-Lib to calculate EMAs but each time I add a new number to the array, TA-Lib performs the calculation from scratch again.

I am doing analysis on a rather large set of data (> 1M rows) and this is quite slow.

What would be the fastest way to calculate the new EMA when a new value is added?

Solution

Let x be the vector of length n containing your samples, that is x[0], ..., x[n-1]. Let y be the vector containing the EMA. Then y is given by the equation:

y[k] = y[k-1] * a + x[k] * (1-a)

Where a is the EMA parameter, which is between 0 and 1. The closer to 1, the smoother the curve.

Therefore, to compute the EMA you just need:

a = 0.9
y[0] = x[0]
for k in range(1, n):
    y[k] = y[k-1]*a + x[k]*(1-a)

Then if you get another sample, that is x[n], you can compute the EMA y[n] without doing the full calculation with:

y[n] = y[n-1]*a + x[n]*(1-a)

This is pseudocode, so if you use lists, it should be something like this:

y.append(y[-1]*a + x[-1]*(1-a))

Edit:

If you really want to improve speed for the computation of the EMA (the whole EMA at a time), you can use numba and numpy:

import numpy as np
from numba import njit
from timeit import timeit

n=1000000
x_np = np.random.randn(n) # your data
x_list = list(x_np)
a = 0.9

def ema_list(x, a):
     y = [x[0]]
     for k in range(1, n):
          y.append(y[-1]*a + x[k]*(1-a))
     return y


@njit("float64[:](float64[:], float64)")
def ema_np(x, a):
     y = np.empty_like(x)
     y[0] = x[0]
     for k in range(1, n):
          y[k] = y[k-1]*a + x[k]*(1-a)
     return y


print(timeit(lambda: ema_list(x_list, a), number=1)) # 0.7080 seconds
print(timeit(lambda: ema_np(x_np, a), number=1)) # 0.008015 seconds

The list implementation takes about 708 ms whereas the numba and numpy takes 8 ms, that is about 88 times faster. The numpy implementation without numba takes similar time to the list implementation.