Search code examples
pythonpandascategorical-data

case insensitive pandas.Series.replace


I want to replace some values in categorical data columns with np.nan. What is the best method for replacing values in a case-insensitive manner while maintaining the same categories (in the same order)?

import pandas as pd 
import numpy as np 

# set up a DF with ordered categories
values = ['one','two','three','na','Na','NA']
df = pd.DataFrame({
    'categ' : values
})
df['categ'] = df['categ'].astype('category')
df['categ'].cat.categories = values


# replace values
df['categ'].replace(
    to_replace='na',
    value=np.nan
)

Solution

  • Maybe replace before converting to category

    import pandas as pd 
    import numpy as np 
    
    # set up a DF with ordered categories
    values = ['one','two','three','na','Na','NA']
    df = pd.DataFrame({
        'categ' : values
    })
    
    
    df['categ'] = df['categ'].str.lower().replace('na',np.nan)
    

    Output

      categ
    0    one
    1    two
    2  three
    3    NaN
    4    NaN
    5    NaN