python pandas linear-algebra multi-index covariance-matrix

Multiply a pandas MultiIndex (k x n x n) by a DataFrame (k x n x 1)

I have a time series of covariance matrices stored as a MultiIndex (let's called it "A") and a time series of scalars stored as a DataFrame (let's call it "b").

"A" is of shape "k" x "n" x "n" and "b" is of shape "k" x "n", so I have a MultiIndex of "k" dates of covariance matrices with shape "n" x "n" that I would like to multiply by a "k" x "n" x 1 DataFrame of scalars to obtain a "k" x "n" x 1 DataFrame.

When I try something like A.multiply(b), it does not work because the MultiIndex dimensions are (kxn, n) and do not strictly match the DataFrame dimensions of (k, n).

I am able to perform the calculation using list comprehension and converting the MultiIndex and DataFrame to numpy arrays, but this takes a very, very long time, so I must be doing it very inefficiently.

An example of this is:

[np.dot( np.array( A.loc[timestamp,:] ), np.array( b.loc[timestamp,:] ) ) for timestamp in b.index]

This takes an incredibly long time to compute. Is there a fast way to perform this calculation?

Solution

Here’s random-valued dataframes of size A 5×3×3 and and b 5×3:

>>> A
                              0         1         2
2021-08-14 11:14:00 0  0.309559  0.286455  0.080467
                    1  0.996128  0.666138  0.354322
                    2  0.113316  0.208389  0.197855
2021-08-14 11:15:00 0  0.299459  0.650561  0.481504
                    1  0.570308  0.298572  0.677013
                    2  0.476066  0.610945  0.750575
2021-08-14 11:16:00 0  0.861801  0.692752  0.046450
                    1  0.587891  0.389874  0.779039
                    2  0.009947  0.647356  0.735746
2021-08-14 11:17:00 0  0.990027  0.185747  0.286276
                    1  0.831238  0.474372  0.459076
                    2  0.885953  0.768626  0.866064
2021-08-14 11:18:00 0  0.952294  0.106072  0.477348
                    1  0.370116  0.646081  0.873394
                    2  0.439066  0.568404  0.227528
>>> b
                            0         1         2
2021-08-14 11:14:00  0.113316  0.208389  0.197855
2021-08-14 11:15:00  0.476066  0.610945  0.750575
2021-08-14 11:16:00  0.009947  0.647356  0.735746
2021-08-14 11:17:00  0.885953  0.768626  0.866064
2021-08-14 11:18:00  0.439066  0.568404  0.227528

If you want to multiply each row in A by a single value of b, i.e. the b columns should align to the second level of A indexes, you should use stack:

>>> A.mul(b.stack(), axis='index')
                              0         1         2
2021-08-14 11:14:00 0  0.035078  0.032460  0.009118
                    1  0.207582  0.138816  0.073837
                    2  0.022420  0.041231  0.039147
2021-08-14 11:15:00 0  0.142562  0.309710  0.229227
                    1  0.348427  0.182411  0.413618
                    2  0.357323  0.458560  0.563363
2021-08-14 11:16:00 0  0.008572  0.006891  0.000462
                    1  0.380575  0.252387  0.504315
                    2  0.007318  0.476290  0.541322
2021-08-14 11:17:00 0  0.877118  0.164563  0.253627
                    1  0.638911  0.364615  0.352858
                    2  0.767292  0.665679  0.750067
2021-08-14 11:18:00 0  0.418120  0.046573  0.209587
                    1  0.210376  0.367235  0.496441
                    2  0.099900  0.129328  0.051769

On the other hand, if you want the b columns to align with the A columns, you could use pd.DataFrame.align(), which returns aligned versions of both dataframes. Here A will be unchanged as confirmed with .compare(), and b_aligned will have its rows repeated per second-level index to match A’s indexing:

>>> b_aligned, A_aligned = b.align(A, level=0)
>>> A_aligned.compare(A)
Empty DataFrame
Columns: []
Index: []
>>> b_aligned
                              0         1         2
2021-08-14 11:14:00 0  0.113316  0.208389  0.197855
                    1  0.113316  0.208389  0.197855
                    2  0.113316  0.208389  0.197855
2021-08-14 11:15:00 0  0.476066  0.610945  0.750575
                    1  0.476066  0.610945  0.750575
                    2  0.476066  0.610945  0.750575
2021-08-14 11:16:00 0  0.009947  0.647356  0.735746
                    1  0.009947  0.647356  0.735746
                    2  0.009947  0.647356  0.735746
2021-08-14 11:17:00 0  0.885953  0.768626  0.866064
                    1  0.885953  0.768626  0.866064
                    2  0.885953  0.768626  0.866064
2021-08-14 11:18:00 0  0.439066  0.568404  0.227528
                    1  0.439066  0.568404  0.227528
                    2  0.439066  0.568404  0.227528
>>> A_aligned.mul(b_aligned)
                              0         1         2
2021-08-14 11:14:00 0  0.035078  0.059694  0.015921
                    1  0.112877  0.138816  0.070104
                    2  0.012840  0.043426  0.039147
2021-08-14 11:15:00 0  0.142562  0.397457  0.361405
                    1  0.271504  0.182411  0.508149
                    2  0.226639  0.373254  0.563363
2021-08-14 11:16:00 0  0.008572  0.448457  0.034176
                    1  0.005848  0.252387  0.573175
                    2  0.000099  0.419070  0.541322
2021-08-14 11:17:00 0  0.877118  0.142770  0.247933
                    1  0.736438  0.364615  0.397589
                    2  0.784913  0.590785  0.750067
2021-08-14 11:18:00 0  0.418120  0.060292  0.108610
                    1  0.162505  0.367235  0.198722
                    2  0.192779  0.323083  0.051769

I’m only guessing to what you’re really doing as you’re not specifying the axis etc., but this gives the same results as your code:

>>> A.mul(b.align(A, level=0)[0]).sum(axis='columns').unstack(1)
                            0         1         2
2021-08-14 11:14:00  0.110693  0.321797  0.095413
2021-08-14 11:15:00  0.901424  0.962065  1.163256
2021-08-14 11:16:00  0.491205  0.831409  0.960491
2021-08-14 11:17:00  1.267821  1.498642  2.125765
2021-08-14 11:18:00  0.587022  0.728462  0.567631