I have a large multi-indexed data frame with many rows and columns. I want to add new columns, with names based on the original level one column names. I want to keep the level two format of the multi-index for each newly created column. The new columns calculate changes and percent changes between columns. Ideally I'd like this to be done automatically so I don't have to create new columns and column names manually.
Original:
import numpy as np
import pandas as pd
data = [[99,3,12,4,63,55]]
cols = pd.MultiIndex.from_product([['1. FY21','2. FY22','3. FY23'],['Values','Sites']])
df = pd.DataFrame(data, columns = cols)
print(df)
Desired Output:
data_new = [[99,3,-36,52,-36,1733,12,4,51,51,425,1275,63,55]]
cols_new = pd.MultiIndex.from_product([['1. FY21','FY23-FY21','FY23-FY21_ % Change','2. FY22','FY23-FY22','FY23-FY22_ % Change','3. FY23'],['Values','Sites']])
df_new = pd.DataFrame(data_new, columns = cols_new)
print(df_new)
Try:
cols = df.columns.get_level_values(0).unique()
last = df.xs(cols[-1], level=0, axis=1)
n = cols[-1].split()[-1]
all_dfs = []
for c in cols[:-1]:
o = df.xs(c, level=0, axis=1)
d = last - o
d.columns = pd.MultiIndex.from_product([[f"{n}-{c.split()[-1]}"], d.columns])
ch = (last / o - 1) * 100
ch.columns = pd.MultiIndex.from_product(
[[f"{n}-{c.split()[-1]}_% Change"], ch.columns]
)
o.columns = pd.MultiIndex.from_product([[c], o.columns])
all_dfs.extend([o, d, ch])
last.columns = pd.MultiIndex.from_product([[cols[-1]], last.columns])
all_dfs.append(last)
out = pd.concat(all_dfs, axis=1)
print(out)
Prints:
1. FY21 FY23-FY21 FY23-FY21_% Change 2. FY22 FY23-FY22 FY23-FY22_% Change 3. FY23
Values Sites Values Sites Values Sites Values Sites Values Sites Values Sites Values Sites
0 99 3 -36 52 -36.363636 1733.333333 12 4 51 51 425.0 1275.0 63 55