Search code examples
pythonpandasmulti-index

Dropping redundant levels from a pandas multiindex


I have a Pandas data frame with a multiindex that is filtered (interactively). The resulting filtered frame have redundant levels in the index where all entries are the same for all entries.

Is there a way to drop these levels from the index?

Having a data frame like:

>>> df = pd.DataFrame([[1,2],[3,4]], columns=["a", "b"], index=pd.MultiIndex.from_tuples([("a", "b", "c"), ("d", "b", "e")], names=["one", "two", "three"]))
>>> df
               a  b
one two three
a   b   c      1  2
d   b   e      3  4

I would like to drop level "two" but without specifying the level since I wouldn't know beforehand which level is redundant.

Something like (made up function...)

>>> df.index = df.index.drop_redundant()
>>> df
           a  b
one three
a   c      1  2
d   e      3  4

Solution

  • You can convert the index to a dataframe, then count the unique number of values per level. Levels with nunique == 1 will then be dropped:

    nunique = df.index.to_frame().nunique()
    to_drop = nunique.index[nunique == 1]
    df = df.droplevel(to_drop)
    

    If you do this a lot, you can monkey-patch it to the DataFrame class:

    def drop_redundant(df: pd.DataFrame, inplace=False):
        if not isinstance(df.index, pd.MultiIndex):
            return df
    
        nunique = df.index.to_frame().nunique()
        to_drop = nunique.index[nunique == 1]
    
        return df.set_index(df.index.droplevel(to_drop), inplace=inplace)
    
    # The monkey patching
    pd.DataFrame.drop_redundant = drop_redundant
    
    # Usage
    df = df.drop_redundant()        # chaining
    df.drop_redundant(inplace=True) # in-place