Search code examples
pythonpandasmulti-index

How to replace an individual level in a multi-level column index in pandas


Consider the following multi-level column index dataframe:

import numpy as np
import pandas as pd

arrays = [
    ["A", "A", "B", "B"],
    ["one", "two", "one", "two"],
    ["1", "2", "1", "pd.NA"],
]
idx = pd.MultiIndex.from_arrays(arrays, names=["level_0", "level_1", "level_2"])
data = np.random.randn(3, 4)
df = pd.DataFrame(data, columns=idx)
print(df)

level_0         A                   B          
level_1       one       two       one       two
level_2         1         2         1     pd.NA
0       -1.249285  0.314225  0.011139  0.675274
1       -0.654808 -0.492350  0.596338 -0.087334
2        0.113570  0.566687 -0.361334  0.085368

level_2 holds values of type object (str really).

df.columns.get_level_values(2)
Index(['1', '2', '1', 'pd.NA'], dtype='object', name='level_2')

I need to parse it to the correct data type and change this particular column level.

new_level_2 = [
    pd.NA if x == "pd.NA" else int(x) for x in df.columns.get_level_values(2)
]

I am looking for a pythonic way to replace the old level_2 with new_level_2.


Solution

  • You could convert the MultiIndex.to_frame then back to MultiIndex.from_frame, change the values with replace and as_type:

    df.columns = pd.MultiIndex.from_frame(df.columns.to_frame()
                                          .replace({'level_2': {'pd.NA': pd.NA}})
                                          .astype({'level_2': 'Int64'}))
    

    Output:

    level_0         A                   B          
    level_1       one       two       one       two
    level_2         1         2         3      <NA>
    0        0.144044  1.454274  0.761038  0.121675
    1        0.443863  0.333674  1.494079 -0.205158
    2        0.313068 -0.854096 -2.552990  0.653619