Consider the following multi-index pd.DataFrame
that has a number of missing values.
import numpy as np
import pandas as pd
# Create multi-index
index = pd.MultiIndex.from_tuples(
[
("A", "X", "I"),
("A", "X", "I"),
("A", "Y", "I"),
("A", "Y", "II"),
("A", "Y", "I"),
],
names=["level_1", "level_2", "level_3"],
)
# Create dataframe
data = [[1, np.nan], [np.nan, 1], [np.nan, 1], [np.nan, 1], [1, np.nan]]
df = pd.DataFrame(data, index=index, columns=["column1", "column2"])
print(df)
column1 column2
level_1 level_2 level_3
A X I 1.0 NaN
I NaN 1.0
Y I NaN 1.0
II NaN 1.0
I 1.0 NaN
How can I squeeze the rows as much as possible? I am looking for the following result:
column1 column2
level_1 level_2 level_3
A X I 1.0 1.0
Y I 1.0 1.0
II NaN 1.0
If possible aggregate values per indices, e.g. mean
:
df = df.groupby(level=[0,1,2]).mean()
print(df)
column1 column2
level_1 level_2 level_3
A X I 1.0 1.0
Y I 1.0 1.0
II NaN 1.0
If you want to avoid aggregation:
f = lambda x: x.apply(lambda x: x.sort_values(key=lambda z: z.isna()))
df = df.groupby(level=[0,1,2], group_keys=False).apply(f).dropna(how='all')
print(df)
column1 column2
level_1 level_2 level_3
A X I 1.0 1.0
Y I 1.0 1.0
II NaN 1.0