I'm trying to rank order (row by row), from greatest to least, a multi-level column dataframe by specific columns (b_1 in this example). I'm hoping to do it in a way that would be dynamic, allowing for additional columns (both level 1 and level 2 in this example) and rows. Thank you.
I'm not sure if the best way would be to put columns b_1 into a separate dataframe, rank, and then drop back into the existing or if there is a way to add directly into the existing dataframe.
Here's an example of the kind of data I'm working with.
import pandas as pd
import numpy as np
level_2 = ['a_1', 'b_1', 'c_1', 'a_1', 'b_1', 'c_1', 'a_1', 'b_1', 'c_1']
level_1 = ['A1', 'A1', 'A1', 'B1', 'B1', 'B1', 'C1', 'C1', 'C1']
data = [['a', 23, 'h', 'o', 45, 'v', 'a3', 1, 'b1'], ['b', 34, 'i', 'p', 3, 'w', 'a4', 32, 'b2'], ['c', 5, 'j', 'q', 7, 'x', 'a5', 6, 'b3'], ['d', 2, 'k', 'r', 5, 'y', 'a6', 76, 'b4'], ['e', 78, 'l', 's', 65, 'z', 'a7', 9, 'b5'], ['f', 98, 'm', 't', 23, 'a1', 'a8', 14, 'b6'], ['g', 3, 'n', 'u', 1, 'a2', 'a9', 45, 'b7']]
columns = pd.MultiIndex.from_tuples(list(zip(level_1, level_2)))
df1 = pd.DataFrame(data, columns=columns)
date = ['1/1/2023','1/2/2023','1/3/2023','1/4/2023','1/5/2023','1/6/2023','1/7/2023']
df1.insert(0, 'date', date)
df1.set_index('date', inplace=True)
As a good start, you can try this :
ranks = (
df.xs("b_1", axis=1, level=1, drop_level=False)
.rank(axis=1, ascending=False)
.rename(columns={"b_1": "NEW"})
.astype(int)
)
out = pd.concat([df1, ranks], axis=1).sort_index(axis=1, level=0)
Output :
print(out)
A1 B1 C1
NEW a_1 b_1 c_1 NEW a_1 b_1 c_1 NEW a_1 b_1 c_1
date
1/1/2023 2 a 23 h 1 o 45 v 3 a3 1 b1
1/2/2023 1 b 34 i 3 p 3 w 2 a4 32 b2
1/3/2023 3 c 5 j 1 q 7 x 2 a5 6 b3
1/4/2023 3 d 2 k 2 r 5 y 1 a6 76 b4
1/5/2023 1 e 78 l 2 s 65 z 3 a7 9 b5
1/6/2023 1 f 98 m 2 t 23 a1 3 a8 14 b6
1/7/2023 2 g 3 n 3 u 1 a2 1 a9 45 b7