Search code examples
pythonpandasseriescategorical-data

Pandas: changing cell value to np.nan changes from categorical data to float


I'm trying to convert some cells in a categorical column to NaN, but when I do it the column type changes to float. How can I keep the column as a categorical data?

Here is a working code:

import numpy as np
import pandas as pd
from pandas.api.types import CategoricalDtype
s = pd.Series([1, 2, 2, 3, 2])
cat_type = CategoricalDtype(categories=[1, 2, 3], ordered=False)
s_cat = s.astype(cat_type)
s_cat

Gives:

0    1
1    2
2    2
3    3
4    2
dtype: category
Categories (3, int64): [1, 2, 3]

While:

def nanify(cell):
    if cell>2:
        return np.nan
    else:
        return int(cell)

s_cat.apply(nanify)

Results in the following:

0    1.0
1    2.0
2    2.0
3    NaN
4    2.0
dtype: float64

Solution

  • You can do it if you use a vectorial approach to change the data. Also to be able to compare the values, the categorical must be ordered:

    import numpy as np
    import pandas as pd
    from pandas.api.types import CategoricalDtype
    s = pd.Series([1, 2, 2, 3, 2])
    cat_type = CategoricalDtype(categories=[1, 2, 3], ordered=True)
    s_cat = s.astype(cat_type)
    
    s_cat[s_cat>2] = pd.NA
    

    output:

    0      1
    1      2
    2      2
    3    NaN
    4      2
    dtype: category
    Categories (3, int64): [1 < 2 < 3]