I'm trying to convert some cells in a categorical column to NaN, but when I do it the column type changes to float. How can I keep the column as a categorical data?
Here is a working code:
import numpy as np
import pandas as pd
from pandas.api.types import CategoricalDtype
s = pd.Series([1, 2, 2, 3, 2])
cat_type = CategoricalDtype(categories=[1, 2, 3], ordered=False)
s_cat = s.astype(cat_type)
s_cat
Gives:
0 1
1 2
2 2
3 3
4 2
dtype: category
Categories (3, int64): [1, 2, 3]
While:
def nanify(cell):
if cell>2:
return np.nan
else:
return int(cell)
s_cat.apply(nanify)
Results in the following:
0 1.0
1 2.0
2 2.0
3 NaN
4 2.0
dtype: float64
You can do it if you use a vectorial approach to change the data. Also to be able to compare the values, the categorical must be ordered:
import numpy as np
import pandas as pd
from pandas.api.types import CategoricalDtype
s = pd.Series([1, 2, 2, 3, 2])
cat_type = CategoricalDtype(categories=[1, 2, 3], ordered=True)
s_cat = s.astype(cat_type)
s_cat[s_cat>2] = pd.NA
output:
0 1
1 2
2 2
3 NaN
4 2
dtype: category
Categories (3, int64): [1 < 2 < 3]