I have a dataframe containing multiple columns out of which 1st column is regarded as index. The remaining columns containing values that I want to sort by absolute key and thereafter create new dataframe indicating original dataframe location in terms of index and column of the sorted values. Highly appreciate if somebody can shade some light on how to do it in more pythonic way.
import polars as pl
import numpy as np
df = pl.DataFrame({
"name": ["a", "b", "c", "d", "e", "f"],
"val1": [1.2, -2.3, 3, -3.3, 2.2, -1.3],
"val2": [5, 2, 2, -4, -3, -6]})
vals = df[df.columns[1:]].to_numpy()
sorted_vals = sorted(tuple(vals.reshape(-1,)), key=abs)[::-1]
data = []
for sv in sorted_vals:
i, c = int(np.where(vals==sv)[0][0]), int(np.where(vals==sv)[1][0])
data.append([sv, df[i,'name'], df.columns[1+c]])
new_df = pl.DataFrame(data=data, orient='row', schema=['val', 'name', 'col'])
print(new_df)
# shape: (12, 3)
# ┌──────┬──────┬──────┐
# │ val ┆ name ┆ col │
# │ --- ┆ --- ┆ --- │
# │ f64 ┆ str ┆ str │
# ╞══════╪══════╪══════╡
# │ -6.0 ┆ f ┆ val2 │
# │ 5.0 ┆ a ┆ val2 │
# │ -4.0 ┆ d ┆ val2 │
# │ -3.3 ┆ d ┆ val1 │
# │ … ┆ … ┆ … │
# │ 2.0 ┆ b ┆ val2 │
# │ 2.0 ┆ b ┆ val2 │
# │ -1.3 ┆ f ┆ val1 │
# │ 1.2 ┆ a ┆ val1 │
# └──────┴──────┴──────┘
You should unpivot
and sort
(with an abs
expression for the absolute value):
out = (df.unpivot(index='name', variable_name='col', value_name='val')
.sort(pl.col('val').abs(), descending=True)
)
Output:
shape: (12, 3)
┌──────┬──────┬──────┐
│ name ┆ col ┆ val │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 │
╞══════╪══════╪══════╡
│ f ┆ val2 ┆ -6.0 │
│ a ┆ val2 ┆ 5.0 │
│ d ┆ val2 ┆ -4.0 │
│ d ┆ val1 ┆ -3.3 │
│ … ┆ … ┆ … │
│ b ┆ val2 ┆ 2.0 │
│ c ┆ val2 ┆ 2.0 │
│ f ┆ val1 ┆ -1.3 │
│ a ┆ val1 ┆ 1.2 │
└──────┴──────┴──────┘