Search code examples
pythonnumpycythonnumerical-methods

Optimize this loop with numpy


Generating 5 millions points r[i] recursively with:

import numpy as np
n, a, b, c = 5000000, 0.0000002, 0.5, 0.4
eps = np.random.normal(0, 1, n)
sigma = np.ones(n) * np.sqrt(a)
r = np.zeros(n)
for i in range(1,n):
    sigma[i] = np.sqrt(a + b * r[i-1] ** 2 + c * sigma[i-1] ** 2)
    r[i] = sigma[i] * eps[i]

uses approximatively 17 seconds on my standard i5 laptop computer.

I have used Cython quite often in the past and I know that using it here would probably optimize by a factor 10 < k < 100.

But before having to use Cython in such cases, I was wondering: would there be a plain Numpy/Python method that I don't know that would optimize this much?


Solution

  • Simply changing it to math.sqrt instead of np.sqrt gives you about 40% speedup here.

    Since I'm quite a numba fanatic I tried the numba version versus your one (initial) and the math-one (normal)

    import numpy as np
    import math
    import numba as nb
    n, a, b, c = 500000, 0.0000002, 0.5, 0.4
    eps = np.random.normal(0, 1, n)
    sigma = np.ones(n) * np.sqrt(a)
    r = np.zeros(n)
    
    def initial(n, a, b, c, eps, sigma, r):
        for i in range(1,n):
            sigma[i] = np.sqrt(a + b * r[i-1] ** 2 + c * sigma[i-1] ** 2)
            r[i] = sigma[i] * eps[i]
    
    def normal(n, a, b, c, eps, sigma, r):
        for i in range(1,n):
            sigma[i] = math.sqrt(a + b * r[i-1] ** 2 + c * sigma[i-1] ** 2)
            r[i] = sigma[i] * eps[i]
    
    @nb.njit
    def function(n, a, b, c, eps, sigma, r):
        for i in range(1,n):
            sigma[i] = math.sqrt(a + b * r[i-1] ** 2 + c * sigma[i-1] ** 2)
            r[i] = sigma[i] * eps[i]
    

    Then just to verify the results are the same:

    sigma1 = sigma.copy()
    sigma2 = sigma.copy()
    sigma3 = sigma.copy()
    r1 = r.copy()
    r2 = r.copy()
    r3 = r.copy()
    
    initial(n, a, b, c, eps, sigma1, r1)   
    normal(n, a, b, c, eps, sigma2, r2)       
    function(n, a, b, c, eps, sigma3, r3)
    np.testing.assert_array_almost_equal(sigma1, sigma2)
    np.testing.assert_array_almost_equal(sigma1, sigma3)
    np.testing.assert_array_almost_equal(r1, r2)
    np.testing.assert_array_almost_equal(r1, r3)
    

    Well what about speed (I used n=500000 to have some faster timeit results):

    %timeit initial(n, a, b, c, eps, sigma1, r1)   
    1 loop, best of 3: 7.27 s per loop
    %timeit normal(n, a, b, c, eps, sigma2, r2)   
    1 loop, best of 3: 4.49 s per loop    
    %timeit function(n, a, b, c, eps, sigma3, r3)
    100 loops, best of 3: 17.7 ms per loop
    

    I know you didn't want cython so numba is probably also out of the question but the speedup is amazing (410 times faster!)