python numpy vectorization autocorrelation

Is it possible to speed up this mean autocorrelation calculation in numpy using vectorization?

I am looking to speed-up the following mean autocorrelation code that uses standard autocorrelation function in numpy. Is it possible to vectorize this?

input: X -> shape(100,3000) = 100 day time series of 3000 items

output: float -> find the mean(across the 3000 items) of the lag5 autocorrelation of each item

import numpy as np
import time
A = np.random.rand(100,3000)
start_time = time.time()
np.nanmean(np.array([np.corrcoef(np.array([A[:-5,i],A[5:,i]]))[1,0] for i in range(A.shape[1])]),axis=0)
print("--- %s seconds ---" % (time.time() - start_time))

--- 0.35528588 seconds ---

Solution

I assume you mainly care about smaller runtime and less so about how it is achieved. The following approach achieves a more than 3x speedup by improving the calculations of the correlation coefficients.

More specifically, it replaces calls to np.corrcoef with calls to a custom function fast_corrcoef. In essence, fast_corrcoef follows the numpy approach for calculating the correlation coefficient, except it does not perform a lot of the redundant computation. Mainly, it exploits the fact that for each call of np.corrcoef, which calls np.cov to compute the covariance matrix, which requires computing the average of both series S[:-lag] and S[lag:], does a lot of unneccessary computation: Instead of computing the averages separately as in np.cov, fast_corrcoef computes them by first calculating the sum of S[lag:-lag] and then calculating the averages of the two series using that intermediate sum. For time series of size n, this yields only (n - 2 * lag) + 2 * lag + 2 = n + 2 additions as opposed to 2n additions, i.e. essentially saving 50% of all additions for computing the average, in case of A.shape = (100, 3000) and lag = 5 amounting to (2 * (100 - 5) - (100 - 5 + 2)) * 3000 = 279000 saved additions.

Further, since the covariance matrix is symmetric, a full matrix multiplication when calculating it is not required and thus the call to np.dot as the case with np.cov has also been replaced by an expression which does only the tree vector dot products that are actually necessary, i.e. calculates the covariance only once instead of twice.

import numpy as np
import time


def fast_corrcoef(X, lag):
    n = X.shape[1]
    mid_sum = X[0, lag:].sum()
    X -= np.array([(mid_sum + X[0, :lag].sum()) / n, (mid_sum + X[1, -lag:].sum()) / n])[:, None]
    return (X[0,:] * X[1,:]).sum() / (np.sqrt((X[0,:] ** 2).sum()) * np.sqrt((X[1,:] ** 2).sum()))
    
A = np.random.rand(100,3000)

lag = 5

def orig(A):
    return np.nanmean(np.array([np.corrcoef(np.array([A[:-5,i],A[5:,i]]))[1,0] for i in range(A.shape[1])]),axis=0)

def faster(A, lag):
    corrcoefs = np.empty([A.shape[1]])
    for i in range(A.shape[1]):
        corrcoefs[i] = fast_corrcoef(np.array([A[:-lag, i], A[lag:, i]]), lag)
    return np.nanmean(corrcoefs, axis=0)

start_time = time.time()
res_orig = orig(A)
runtime_orig = time.time() - start_time
print(f'orig: runtime: {runtime_orig:.4f}s; result: {res_orig}')


start_time = time.time()
res_faster = faster(A, lag)
runtime_faster = time.time() - start_time
print(f'fast: runtime: {runtime_faster:.4f}s; result: {res_faster}')

assert abs(res_orig - res_faster) < 1e-16

speedup = runtime_orig / runtime_faster

print(f'speedup: {speedup:.4f}x')

This code printed the following (when I ran it in Google Colab):

orig: runtime: 0.4214s; result: -0.00961872402216293
fast: runtime: 0.1138s; result: -0.009618724022162932
speedup: 3.7030x