I have a pandas data frame like this. Where the index is pd.DatetimeIndex and the columns are timeseries.
x_1 | x_2 | x_3 | |
---|---|---|---|
2020-08-17 | 133.23 | 2457.45 | -4676 |
2020-08-18 | -982 | -6354.56 | -245.657 |
2020-08-19 | 5678.642 | 245.2786 | 2461.785 |
2020-08-20 | -2394 | 154.34 | -735.653 |
2020-08-20 | 236 | -8876 | -698.245 |
I need to calculate the Euclidean distance of all the columns against each other. I.e., (x_1 - x_2), (x_1 - x_3), (x_2 - x_3), and return a square data frame like this: (Please realize that the values in this table are just an example and not the actual result of the Euclidean distance)
x_1 | x_2 | x_3 | |
---|---|---|---|
x_1 | 0 | 123 | 456 |
x_2 | 123 | 0 | 789 |
x_3 | 456 | 789 | 0 |
I tried this resource but I could not figure out how to pass the columns of my df. If understand correctly the example passes the rows as the series to calculate the ED from.
An explicit way of achieving this would be:
from itertools import combinations
import numpy as np
dist_df = pd.DataFrame(index=df.columns, columns=df.columns)
for col_a, col_b in combinations(df.columns, 2):
dist = np.linalg.norm(df[col_a] - df[col_b])
dist_df.loc[col_a, col_b] = dist
dist_df.loc[col_b, col_a] = dist
print(dist_df)
outputs
x_1 x_2 x_3
x_1 NaN 12381.858429 6135.306973
x_2 12381.858429 NaN 12680.121047
x_3 6135.306973 12680.121047 NaN
If you want 0
instead of NaN
use DataFrame.fillna
:
dist_df.fillna(0, inplace=True)