Search code examples
pythonpandasdataframemulti-index

Squeeze rows containing missing values in multi-index dataframe


Consider the following multi-index pd.DataFrame that has a number of missing values.

import numpy as np
import pandas as pd

# Create multi-index
index = pd.MultiIndex.from_tuples(
    [
        ("A", "X", "I"),
        ("A", "X", "I"),
        ("A", "Y", "I"),
        ("A", "Y", "II"),
        ("A", "Y", "I"),
    ],
    names=["level_1", "level_2", "level_3"],
)

# Create dataframe
data = [[1, np.nan], [np.nan, 1], [np.nan, 1], [np.nan, 1], [1, np.nan]]
df = pd.DataFrame(data, index=index, columns=["column1", "column2"])

print(df)

                         column1  column2
level_1 level_2 level_3                  
A       X       I            1.0      NaN
                I            NaN      1.0
        Y       I            NaN      1.0
                II           NaN      1.0
                I            1.0      NaN

How can I squeeze the rows as much as possible? I am looking for the following result:

                         column1  column2
level_1 level_2 level_3                  
A       X       I            1.0      1.0
        Y       I            1.0      1.0
                II           NaN      1.0

Solution

  • If possible aggregate values per indices, e.g. mean:

    df = df.groupby(level=[0,1,2]).mean()
    print(df)
                             column1  column2
    level_1 level_2 level_3                  
    A       X       I            1.0      1.0
            Y       I            1.0      1.0
                    II           NaN      1.0
    

    If you want to avoid aggregation:

    f = lambda x: x.apply(lambda x: x.sort_values(key=lambda z: z.isna()))
    
    df = df.groupby(level=[0,1,2], group_keys=False).apply(f).dropna(how='all')
    print(df)
                             column1  column2
    level_1 level_2 level_3                  
    A       X       I            1.0      1.0
            Y       I            1.0      1.0
                    II           NaN      1.0