I have a dataframe of categories and amounts. Categories can be nested into sub categories an infinite levels using a colon separated string. I wish to sort it by descending amount. But in hierarchical type fashion like shown.
How I need it sorted
Transport 5000
Transport : Car 4900
Transport : Train 100
Household 1100
Household : Utilities 600
Household : Utilities : Water 400
Household : Utilities : Electric 200
Household : Cleaning 100
Household : Cleaning : Bathroom 75
Household : Cleaning : Kitchen 25
Household : Rent 400
Living 250
Living : Other 150
Living : Food 100
EDIT: The data frame:
"category": ["Transport", "Transport : Car", "Transport : Train", "Household", "Household : Utilities", "Household : Utilities : Water", "Household : Utilities : Electric", "Household : Cleaning", "Household : Cleaning : Bathroom", "Household : Cleaning : Kitchen", "Household : Rent", "Living", "Living : Other", "Living : Food"],
"amount": [5000, 4900, 100, 1100, 600, 400, 200, 100, 75, 25, 400, 250, 150, 100]
Note: this is the order I want it. It may be in any arbitrary order before the sort.
EDIT2: If anyone looking for a similar solution I posted the one I settled on here: How to sort dataframe in pandas by value in hierarchical category structure
To answer my own question: I found a way. Kind of long winded but here it is.
import numpy as np
import pandas as pd
def sort_tree_df(df, tree_column, sort_column):
sort_key = sort_column + '_abs'
df[sort_key] = df[sort_column].abs()
df.index = pd.MultiIndex.from_frame(
df[tree_column].str.split(":").apply(lambda x: [y.strip() for y in x]).apply(pd.Series))
sort_columns = [df[tree_column].values, df[sort_key].values] + [
df.groupby(level=list(range(0, x)))[sort_key].transform('max').values
for x in range(df.index.nlevels - 1, 0, -1)
sort_indexes = np.lexsort(sort_columns)
df_sorted = df.iloc[sort_indexes[::-1]]
df_sorted.reset_index(drop=True, inplace=True)
df_sorted.drop(sort_key, axis=1, inplace=True)
return df_sorted
sort_tree_df(df, 'category', 'amount')