Search code examples
pythonpandasdataframemulti-index

In pandas, how to reliably set the index order of multilevel columns during or after a pivot of two columns plus a value column


After pivoting around two columns with a separate value column, I want a df with multiindex columns in a specific order, like so (please ignore that multi-2 and multi-3 labels are pointless in the simplified example):

multi-1       one       two
multi-2   multi-2   multi-2
multi-3  SomeText  SomeText
mIndex                     
bar     -1.788089 -0.631030
baz     -1.836282  0.762363
foo     -1.104848 -0.444981
qux     -0.484606 -0.507772

Starting with a multiindex series of values, labelled multi-2, I create a three column df: column 1 - the serie's indexes (multi-1); column 2 - the values (multi-2); plus another column (multi-3), which I really only want for the column label. I then want to pivot this df around multi-1 and multi-3, with values multi-2. PROBLEM: The multiindex column labels MUST always be in a specific order: multi-1, multi-2, then multi-3.

import pandas as pd
import numpy as np

arrays = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
          ["one", "two", "one", "two", "one", "two", "one", "two"]]

tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["mIndex", "multi-1"])

s = pd.Series(np.random.randn(8), index=index)
s.rename("multi-2", inplace=True)

df = pd.DataFrame(s.reset_index(level=["multi-1"]))
df["multi-3"] = "SomeText"

df = df.pivot(columns={"multi-1", "multi-3"}, values=["multi-2"])
df = df.swaplevel(0,1, axis=1)      # option 1: works only sometimes
# ???? how do I name the values level ????
df = df.reorder_levels("multi-1", "multi-2", "multi-3")  # option 2: set fixed order

Including multi-2 in the columns during the pivot creates another level.

The .swaplevel method does not always return the same order because (I guess) the original index order is not always the same following the pivot. Can this be right?!?

To use the reorder_levels, I need to somehow set an index label for the multi-2 value level (which is currently "None", along side "Multi-1" and "Multi-3").

Is there a way to set the label during the pivot? or after the pivot in a way that doesn't use the index (which seems to change somehow)? Or another way to get the same outcome?


Solution

  • After pivot, the values don't have an index name, you have to assign it:

    (df.pivot(columns={'multi-1', 'multi-3'}, values=['multi-2'])
       .rename_axis(columns={None: 'multi-2'})
       .reorder_levels(['multi-1', 'multi-2', 'multi-3'], axis=1) 
    )
    

    Output:

    multi-1       one       two
    multi-2   multi-2   multi-2
    multi-3  SomeText  SomeText
    mIndex                     
    bar      0.938079 -1.051440
    baz      0.263281  1.388145
    foo     -0.965295  0.611163
    qux     -1.120318 -0.529974
    

    Alternatively: swaplevel doesn't work consistently because you used a set (that is unordered) in pivot, use a list instead:

    (df.pivot(columns=['multi-1', 'multi-3'], values=['multi-2'])
       .swaplevel(0, 1, axis=1)
    )
    

    NB. you can also add .rename_axis(columns={None: 'multi-2'}) if desired.

    Output:

    multi-1       one       two
              multi-2   multi-2
    multi-3  SomeText  SomeText
    mIndex                     
    bar      0.542184 -0.199041
    baz      1.253028 -1.006294
    foo      0.252699 -1.728199
    qux      0.572631 -0.694103
    
    # with more columns
    # columns=['multi-1', 'multi-3', 'multi-4', 'multi-5']
    
    multi-1       one       two
              multi-2   multi-2
    multi-3  SomeText  SomeText
    multi-4  SomeText  SomeText
    multi-5  SomeText  SomeText
    mIndex                     
    bar      0.071546  0.264463
    baz      0.516355  1.594471
    foo     -0.194536 -1.344563
    qux     -0.197232 -0.845405