python pandas performance for-loop vectorization

How to speed up/vectorize a multilevel iteration calculating rolling covariance matrix?

Since for-loops have bad performance in python, I need to speed up the following code.

Things I tried:

1. apply. -- Haven't figured out how to apply on multilevel df.

2. Numba. -- Seems Numba or Bodo do not support pandas rolling.

code as below:

df = pd.DataFrame(np.random.randn(9,3),columns=['A','B','C'])
df_result = pd.DataFrame()
shape = np.full(df.shape[1],1)

def func_cov(df):
    df_cov = df.rolling(3,min_periods=3).cov()
    for i in df.index:
        df_result.loc[i,'result'] = np.dot(shape.T,np.dot(df_cov.loc[i], shape))
    return df_result

func_cov(df)


df:
    A   B   C
0   0.191484    0.765756    -1.288696
1   -0.111369   1.276903    1.567775
2   -0.209460   2.920247    0.142898
3   0.169375    1.096265    -0.646460
4   3.847551    0.936200    -1.221572
5   -1.783127   0.426784    1.311940
6   -0.417902   0.253048    0.097059
7   -1.176098   -0.975650   1.481306
8   -1.429595   0.257955    -0.832083


desired df_result:
    result
0   NaN
1   NaN
2   3.258732
3   1.579507
4   2.359369
5   3.684835
6   4.364114
7   0.125943
8   0.981440

Solution

You can convert the dataframe to a Numpy array and then do all the job using Numba and basic loops:

import numba as nb

df = pd.DataFrame(np.random.randn(9,3),columns=['A','B','C'])
df_result = pd.DataFrame()
shape = np.full(df.shape[1],1)

@nb.njit('(float64[:,::1], float64[:])')
def fast_func_cov(values, shape):
    result = np.empty(len(values))
    result[0] = result[1] = np.nan
    for i in range(2, len(values)):
        cov_mat = np.cov(values[i-2:i+1,:].T)
        result[i] = np.dot(shape.T,np.dot(cov_mat, shape))
    return result
fast_func_cov(np.ascontiguousarray(df.values), shape.astype(np.float64))

values = np.ascontiguousarray(df.values)
df_result['result'] = fast_func_cov(values, shape.astype(np.float64))

On my machine, the computation takes 0.016 ms compared to to 7 ms for the initial computing function. This is about 440 times faster. That being said, the Pandas assignment 0.032 ms resulting in a 150 times faster code overall.