Search code examples
pythonnumpycythonnumba

Dynamically discounted cumulative sum in Numpy


I have a frequently occurring problem where I have two arrays of the same length: one with values and one with dynamic decay factors; and wish to calculate a vector of the decayed cumulative sum at each position. Using a Python loop to express the desired recurrence we have the following:

c = np.empty_like(x)
c[0] = x[0]
for i in range(1, len(x)):
    c[i] = c[i-1] * d[i] + x[i]

The Python code is very clear and readable, but slows things down significantly. I can get around this by using Numba to JIT-compile it, or Cython to precompile it. If the discount factors were a fixed number (which is not the case), I could have used SciPy's signals library and do an lfilter (see https://stackoverflow.com/a/47971187/1017986).

Is there a more "Numpythonic" way to express this without sacrificing clarity or efficiency?


Solution

  • Mathematically speaking, this is a first-order non-homogeneous recurrence relation with variable coefficients.

    Please note that coefficients (in your case, values in d[1:]) must be different from 0.

    Here is a way to solve your recurrence relation using NumPy functions. Note that d[0] is supposed to be equal to 1 (your algorithm is not using it anyway). In this solution, as you can see, the values in c do not depend on previous values in c. I.e., c[i] does not depend on c[i-1].

    g = np.insert(x[1:], 0, 0)
    
    pd = np.cumprod(d)
    c = pd * (x[0] + np.cumsum(g / pd))
    

    Let's make an example. Here I define two functions, one that computes the recurrence relation with your code, and one that uses NumPy:

    def rec(d, x):
        c = [x[0]]
        for i in range(1, len(x)):
            c.append(d[i] * c[-1] + x[i])
        return np.array(c)
    
    def num(d, x):
        g = np.insert(x[1:], 0, 0)
        pd = np.cumprod(d)
        return pd * (x[0] + np.cumsum(g / pd))
    

    Here are some usage examples:

    >>> rec(np.array([3, 4, 5.2, 6.1]), np.array([2, 3.2, 4, 5.7]))
    array([  2.   ,  11.2  ,  62.24 , 385.364])
    >>> num(np.array([3, 4, 5.2, 6.1]), np.array([2, 3.2, 4, 5.7]))
    array([  2.   ,  11.2  ,  62.24 , 385.364])
    
    >>> rec(np.array([3, 4.3, 5.2]), np.array([6.7, 7.1, 1.2]))
    array([  6.7  ,  35.91 , 187.932])
    >>> num(np.array([3, 4.3, 5.2]), np.array([6.7, 7.1, 1.2]))
    array([  6.7  ,  35.91 , 187.932])
    

    If you're interested in the mathematical details behind this answer (which are out of scope here), check this Wikipedia page.

    Edit

    This solution, while mathematically correct, might introduce numerical issues during the computation (because of the numpy.cumprod function), especially if d contains many values that are close to 0.