I have a time series of covariance matrices stored as a MultiIndex (let's called it "A") and a time series of scalars stored as a DataFrame (let's call it "b").
"A" is of shape "k" x "n" x "n" and "b" is of shape "k" x "n", so I have a MultiIndex of "k" dates of covariance matrices with shape "n" x "n" that I would like to multiply by a "k" x "n" x 1 DataFrame of scalars to obtain a "k" x "n" x 1 DataFrame.
When I try something like A.multiply(b), it does not work because the MultiIndex dimensions are (kxn, n) and do not strictly match the DataFrame dimensions of (k, n).
I am able to perform the calculation using list comprehension and converting the MultiIndex and DataFrame to numpy arrays, but this takes a very, very long time, so I must be doing it very inefficiently.
An example of this is:
[np.dot( np.array( A.loc[timestamp,:] ), np.array( b.loc[timestamp,:] ) ) for timestamp in b.index]
This takes an incredibly long time to compute. Is there a fast way to perform this calculation?
Here’s random-valued dataframes of size A
5×3×3 and and b
5×3:
>>> A
0 1 2
2021-08-14 11:14:00 0 0.309559 0.286455 0.080467
1 0.996128 0.666138 0.354322
2 0.113316 0.208389 0.197855
2021-08-14 11:15:00 0 0.299459 0.650561 0.481504
1 0.570308 0.298572 0.677013
2 0.476066 0.610945 0.750575
2021-08-14 11:16:00 0 0.861801 0.692752 0.046450
1 0.587891 0.389874 0.779039
2 0.009947 0.647356 0.735746
2021-08-14 11:17:00 0 0.990027 0.185747 0.286276
1 0.831238 0.474372 0.459076
2 0.885953 0.768626 0.866064
2021-08-14 11:18:00 0 0.952294 0.106072 0.477348
1 0.370116 0.646081 0.873394
2 0.439066 0.568404 0.227528
>>> b
0 1 2
2021-08-14 11:14:00 0.113316 0.208389 0.197855
2021-08-14 11:15:00 0.476066 0.610945 0.750575
2021-08-14 11:16:00 0.009947 0.647356 0.735746
2021-08-14 11:17:00 0.885953 0.768626 0.866064
2021-08-14 11:18:00 0.439066 0.568404 0.227528
If you want to multiply each row in A
by a single value of b
, i.e. the b
columns should align to the second level of A
indexes, you should use stack
:
>>> A.mul(b.stack(), axis='index')
0 1 2
2021-08-14 11:14:00 0 0.035078 0.032460 0.009118
1 0.207582 0.138816 0.073837
2 0.022420 0.041231 0.039147
2021-08-14 11:15:00 0 0.142562 0.309710 0.229227
1 0.348427 0.182411 0.413618
2 0.357323 0.458560 0.563363
2021-08-14 11:16:00 0 0.008572 0.006891 0.000462
1 0.380575 0.252387 0.504315
2 0.007318 0.476290 0.541322
2021-08-14 11:17:00 0 0.877118 0.164563 0.253627
1 0.638911 0.364615 0.352858
2 0.767292 0.665679 0.750067
2021-08-14 11:18:00 0 0.418120 0.046573 0.209587
1 0.210376 0.367235 0.496441
2 0.099900 0.129328 0.051769
On the other hand, if you want the b
columns to align with the A
columns, you could use pd.DataFrame.align()
, which returns aligned versions of both dataframes. Here A
will be unchanged as confirmed with .compare()
, and b_aligned
will have its rows repeated per second-level index to match A
’s indexing:
>>> b_aligned, A_aligned = b.align(A, level=0)
>>> A_aligned.compare(A)
Empty DataFrame
Columns: []
Index: []
>>> b_aligned
0 1 2
2021-08-14 11:14:00 0 0.113316 0.208389 0.197855
1 0.113316 0.208389 0.197855
2 0.113316 0.208389 0.197855
2021-08-14 11:15:00 0 0.476066 0.610945 0.750575
1 0.476066 0.610945 0.750575
2 0.476066 0.610945 0.750575
2021-08-14 11:16:00 0 0.009947 0.647356 0.735746
1 0.009947 0.647356 0.735746
2 0.009947 0.647356 0.735746
2021-08-14 11:17:00 0 0.885953 0.768626 0.866064
1 0.885953 0.768626 0.866064
2 0.885953 0.768626 0.866064
2021-08-14 11:18:00 0 0.439066 0.568404 0.227528
1 0.439066 0.568404 0.227528
2 0.439066 0.568404 0.227528
>>> A_aligned.mul(b_aligned)
0 1 2
2021-08-14 11:14:00 0 0.035078 0.059694 0.015921
1 0.112877 0.138816 0.070104
2 0.012840 0.043426 0.039147
2021-08-14 11:15:00 0 0.142562 0.397457 0.361405
1 0.271504 0.182411 0.508149
2 0.226639 0.373254 0.563363
2021-08-14 11:16:00 0 0.008572 0.448457 0.034176
1 0.005848 0.252387 0.573175
2 0.000099 0.419070 0.541322
2021-08-14 11:17:00 0 0.877118 0.142770 0.247933
1 0.736438 0.364615 0.397589
2 0.784913 0.590785 0.750067
2021-08-14 11:18:00 0 0.418120 0.060292 0.108610
1 0.162505 0.367235 0.198722
2 0.192779 0.323083 0.051769
I’m only guessing to what you’re really doing as you’re not specifying the axis etc., but this gives the same results as your code:
>>> A.mul(b.align(A, level=0)[0]).sum(axis='columns').unstack(1)
0 1 2
2021-08-14 11:14:00 0.110693 0.321797 0.095413
2021-08-14 11:15:00 0.901424 0.962065 1.163256
2021-08-14 11:16:00 0.491205 0.831409 0.960491
2021-08-14 11:17:00 1.267821 1.498642 2.125765
2021-08-14 11:18:00 0.587022 0.728462 0.567631