Search code examples
pythonpandasperformancefor-loopvectorization

How to speed up/vectorize a multilevel iteration calculating rolling covariance matrix?


Since for-loops have bad performance in python, I need to speed up the following code.

Things I tried:

1. apply. -- Haven't figured out how to apply on multilevel df.

2. Numba. -- Seems Numba or Bodo do not support pandas rolling.

code as below:

df = pd.DataFrame(np.random.randn(9,3),columns=['A','B','C'])
df_result = pd.DataFrame()
shape = np.full(df.shape[1],1)

def func_cov(df):
    df_cov = df.rolling(3,min_periods=3).cov()
    for i in df.index:
        df_result.loc[i,'result'] = np.dot(shape.T,np.dot(df_cov.loc[i], shape))
    return df_result

func_cov(df)


df:
    A   B   C
0   0.191484    0.765756    -1.288696
1   -0.111369   1.276903    1.567775
2   -0.209460   2.920247    0.142898
3   0.169375    1.096265    -0.646460
4   3.847551    0.936200    -1.221572
5   -1.783127   0.426784    1.311940
6   -0.417902   0.253048    0.097059
7   -1.176098   -0.975650   1.481306
8   -1.429595   0.257955    -0.832083


desired df_result:
    result
0   NaN
1   NaN
2   3.258732
3   1.579507
4   2.359369
5   3.684835
6   4.364114
7   0.125943
8   0.981440



Solution

  • You can convert the dataframe to a Numpy array and then do all the job using Numba and basic loops:

    import numba as nb
    
    df = pd.DataFrame(np.random.randn(9,3),columns=['A','B','C'])
    df_result = pd.DataFrame()
    shape = np.full(df.shape[1],1)
    
    @nb.njit('(float64[:,::1], float64[:])')
    def fast_func_cov(values, shape):
        result = np.empty(len(values))
        result[0] = result[1] = np.nan
        for i in range(2, len(values)):
            cov_mat = np.cov(values[i-2:i+1,:].T)
            result[i] = np.dot(shape.T,np.dot(cov_mat, shape))
        return result
    fast_func_cov(np.ascontiguousarray(df.values), shape.astype(np.float64))
    
    values = np.ascontiguousarray(df.values)
    df_result['result'] = fast_func_cov(values, shape.astype(np.float64))
    

    On my machine, the computation takes 0.016 ms compared to to 7 ms for the initial computing function. This is about 440 times faster. That being said, the Pandas assignment 0.032 ms resulting in a 150 times faster code overall.