I have a multi column index dataframe. Some column headers might have pd.NA
values.
The actual values in the dataframe might be zero, one, or pd.NA
.
How can I transform all zeros and ones into bool
while preserving the pd.NA
values?
import pandas as pd
idx_l1 = ("a", "b")
idx_l2 = (pd.NA, pd.NA)
idx_l3 = ("c", "c")
df = pd.DataFrame(
data=[
[1, pd.NA, 0, pd.NA, 0, 1, pd.NA, pd.NA],
[pd.NA, 0, 1, pd.NA, pd.NA, pd.NA, 0, 0],
[0, 1, 1, 1, 0, pd.NA, pd.NA, 0],
],
columns=pd.MultiIndex.from_product([idx_l1, idx_l2, idx_l3]),
)
df = df.rename_axis(["level1", "level2", "level3"], axis=1)
print(df)
level1 a b
level2 NaN NaN
level3 c c c c c c c c
0 1 <NA> 0 <NA> 0 1 <NA> <NA>
1 <NA> 0 1 <NA> <NA> <NA> 0 0
2 0 1 1 1 0 <NA> <NA> 0
You can use the .replace
method:
df = df.replace({1: True, 0: False})
print(df)
Output:
level1 a b
level2 NaN NaN
level3 c c c c c c c c
0 True <NA> False <NA> False True <NA> <NA>
1 <NA> False True <NA> <NA> <NA> False False
2 False True True True False <NA> <NA> False