Given a time series of returns of 3 assets like this:
Asset 1 Asset 2 Asset 3
date
2018-08-09 0.004475 0.008847 0.006195
2018-08-10 0.025081 0.006637 0.013696
2018-08-13 -0.003005 0.011540 0.003559
2018-08-14 -0.012873 -0.009074 -0.005245
2018-08-15 0.003383 0.004275 0.000268
...
I need to calculate a matrix that indicates the probability that, if one asset's is at the tail of it's return distribution, the other is too. I need an outuput like this:
Asset1 Asset2 Asset3
Asset1 1.000000 0.00000 0.073171
Asset2 0.000000 1.00000 0.048780
Asset3 0.046875 0.04878 1.000000
The interpretation is, if Asset 1 is at it's tail, Asset 3 has a probability 4.6% of being at it's tail too. The code that I wrote is this:
import pandas as pd
"""r is the dataframe of time series returns in the format presented."""
def dummy_tail_dependence(r):
x = r[r<r.quantile(q=0.025)]
df = pd.DataFrame(index = r.columns)
for asset in list(r.columns):
df[asset] = (x.dropna(subset=[asset]).count())/(x[asset].count())
return df
I do not want to depend on loops to achieve the desired output. Is there a more numerical way of doing this using pandas
OR numpy
? With matrices operations?
Your example is very unrepresentative (there is only one row where any asset is in its tail). Yet, the following solution may be of help.
Start by finding all cells where the corresponding asset is in its tail:
in_tail = df < df.quantile(q=0.025)
Now, calculate the mean value of each asset for each other asset being in its tail. The mean of a boolean series is the probability of the series being true:
probs = pd.concat([in_tail[in_tail[col]==True].mean()
for col in in_tail], axis=1)
probs.columns = probs.index
# Asset1 Asset2 Asset3
#Asset1 1.0 1.0 1.0
#Asset2 1.0 1.0 1.0
#Asset3 1.0 1.0 1.0
(Yes, the example is poor.)