After pivoting around two columns with a separate value column, I want a df with multiindex columns in a specific order, like so (please ignore that multi-2 and multi-3 labels are pointless in the simplified example):
multi-1 one two
multi-2 multi-2 multi-2
multi-3 SomeText SomeText
mIndex
bar -1.788089 -0.631030
baz -1.836282 0.762363
foo -1.104848 -0.444981
qux -0.484606 -0.507772
Starting with a multiindex series of values, labelled multi-2, I create a three column df: column 1 - the serie's indexes (multi-1); column 2 - the values (multi-2); plus another column (multi-3), which I really only want for the column label. I then want to pivot this df around multi-1 and multi-3, with values multi-2. PROBLEM: The multiindex column labels MUST always be in a specific order: multi-1, multi-2, then multi-3.
import pandas as pd
import numpy as np
arrays = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
["one", "two", "one", "two", "one", "two", "one", "two"]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["mIndex", "multi-1"])
s = pd.Series(np.random.randn(8), index=index)
s.rename("multi-2", inplace=True)
df = pd.DataFrame(s.reset_index(level=["multi-1"]))
df["multi-3"] = "SomeText"
df = df.pivot(columns={"multi-1", "multi-3"}, values=["multi-2"])
df = df.swaplevel(0,1, axis=1) # option 1: works only sometimes
# ???? how do I name the values level ????
df = df.reorder_levels("multi-1", "multi-2", "multi-3") # option 2: set fixed order
Including multi-2 in the columns during the pivot creates another level.
The .swaplevel method does not always return the same order because (I guess) the original index order is not always the same following the pivot. Can this be right?!?
To use the reorder_levels, I need to somehow set an index label for the multi-2 value level (which is currently "None", along side "Multi-1" and "Multi-3").
Is there a way to set the label during the pivot? or after the pivot in a way that doesn't use the index (which seems to change somehow)? Or another way to get the same outcome?
After pivot
, the values don't have an index name, you have to assign it:
(df.pivot(columns={'multi-1', 'multi-3'}, values=['multi-2'])
.rename_axis(columns={None: 'multi-2'})
.reorder_levels(['multi-1', 'multi-2', 'multi-3'], axis=1)
)
Output:
multi-1 one two
multi-2 multi-2 multi-2
multi-3 SomeText SomeText
mIndex
bar 0.938079 -1.051440
baz 0.263281 1.388145
foo -0.965295 0.611163
qux -1.120318 -0.529974
Alternatively: swaplevel
doesn't work consistently because you used a set
(that is unordered) in pivot
, use a list
instead:
(df.pivot(columns=['multi-1', 'multi-3'], values=['multi-2'])
.swaplevel(0, 1, axis=1)
)
NB. you can also add .rename_axis(columns={None: 'multi-2'})
if desired.
Output:
multi-1 one two
multi-2 multi-2
multi-3 SomeText SomeText
mIndex
bar 0.542184 -0.199041
baz 1.253028 -1.006294
foo 0.252699 -1.728199
qux 0.572631 -0.694103
# with more columns
# columns=['multi-1', 'multi-3', 'multi-4', 'multi-5']
multi-1 one two
multi-2 multi-2
multi-3 SomeText SomeText
multi-4 SomeText SomeText
multi-5 SomeText SomeText
mIndex
bar 0.071546 0.264463
baz 0.516355 1.594471
foo -0.194536 -1.344563
qux -0.197232 -0.845405