Search code examples
performancenumpyoptimizationmatrixeigenvalue

Optimize mean outer products


I am currently writing a short program to carry out some analysis of random matrix eigenvalue distributions but the parameter choice required for my analysis turned out making the whole thing extremely slow. Basically I should be looping over the function below, ideally for approx 5000 times, and eventually collecting the complete list of eigenvalues at the end.

C = np.zeros((N,N))
time_series = np.random.normal(mu,sigma,  (N + B*(M-1))    )

for k in range(int(M)):
    C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
C = C/M

eg_v = np.linalg.eigvalsh(C)

where I need N = 1000, B around 10, M = 100. However, with this choice of parameters the program takes something like 4-5 hours to run on my quite-performing laptop.

Leaving hardware limitations aside, is there something I can do with respect to the code to speed the whole thing up?

Thanks in advance!


Solution

  • You can replace the loop with a vectorized solution using np.tensordot

    Thus, the following -

    C = np.zeros((N,N))
    for k in range(int(M)):
        C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
    

    could be replaced by -

    # Get the starting indices for each iteration
    idx = (np.arange(M)*B)[:,None] + np.arange(N)
    
    # Get the range of indices across all iterations as a 2D array and index 
    # time_series with it to give us "time_series[k*B : (N) + k*B]" equivalent  
    time_idx = time_series[idx]
    
    # Use broadcasting to perform summation accumulation
    C = np.tensordot(time_idx,time_idx,axes=([0],[0]))
    

    The tensordot could be replaced by a simple dot-product :

    C = time_idx.T.dot(time_idx)
    

    Runtime test

    Functions :

    def original_app(time_series,B,N,M):
        C = np.zeros((N,N))
        for k in range(int(M)):
            C += np.outer(time_series[k*B : (N) + k*B], time_series[k*B : (N) + k*B])
        return C
    
    def vectorized_app(time_series,B,N,M):
        idx = (np.arange(M)*B)[:,None] + np.arange(N)
        time_idx = time_series[idx]
        return np.tensordot(time_idx,time_idx,axes=([0],[0]))
    

    Inputs :

    In [115]: # Inputs
         ...: mu = 1.2
         ...: sigma = 0.5
         ...: N = 1000
         ...: M = 100
         ...: B = 10
         ...: time_series = np.random.normal(mu,sigma,  (N + B*(M-1))  )
         ...: 
    

    Timings :

    In [116]: out1 = original_app(time_series,B,N,M)
    
    In [117]: out2 = vectorized_app(time_series,B,N,M)
    
    In [118]: np.allclose(out1,out2)
    Out[118]: True
    
    In [119]: %timeit original_app(time_series,B,N,M)
    1 loops, best of 3: 1.56 s per loop
    
    In [120]: %timeit vectorized_app(time_series,B,N,M)
    10 loops, best of 3: 26.2 ms per loop
    

    Thus, we are seeing a 60x speedup for the inputs listed in the question!