Search code examples
pythonpandasnan

Convert zeros and ones to bool while preserving pd.NA in a multi column index dataframe


I have a multi column index dataframe. Some column headers might have pd.NA values.

The actual values in the dataframe might be zero, one, or pd.NA.

How can I transform all zeros and ones into bool while preserving the pd.NA values?

import pandas as pd

idx_l1 = ("a", "b")
idx_l2 = (pd.NA, pd.NA)
idx_l3 = ("c", "c")

df = pd.DataFrame(
    data=[
        [1, pd.NA, 0, pd.NA, 0, 1, pd.NA, pd.NA],
        [pd.NA, 0, 1, pd.NA, pd.NA, pd.NA, 0, 0],
        [0, 1, 1, 1, 0, pd.NA, pd.NA, 0],
    ],
    columns=pd.MultiIndex.from_product([idx_l1, idx_l2, idx_l3]),
)
df = df.rename_axis(["level1", "level2", "level3"], axis=1)

print(df)

level1     a                    b                  
level2   NaN                  NaN                  
level3     c     c  c     c     c     c     c     c
0          1  <NA>  0  <NA>     0     1  <NA>  <NA>
1       <NA>     0  1  <NA>  <NA>  <NA>     0     0
2          0     1  1     1     0  <NA>  <NA>     0

Solution

  • You can use the .replace method:

    df = df.replace({1: True, 0: False})
    
    print(df)
    

    Output:

    level1      a                          b                    
    level2    NaN                        NaN                    
    level3      c      c      c     c      c     c      c      c
    0        True   <NA>  False  <NA>  False  True   <NA>   <NA>
    1        <NA>  False   True  <NA>   <NA>  <NA>  False  False
    2       False   True   True  True  False  <NA>   <NA>  False