Search code examples
pythonpandasreplaceignore-case

Python: replace case insensitive flag doesn't work


In my dataframe I want to replace different ways of representing something with a single consistent string. Examples:

  1. Replace [COM, COMMERCIAL] with "Commercial".
  2. Replace [FALSE, False, false, NO, No, N] with N and so on.

The list of values to be replaced and the replacement value come from another dataframe and will change as I run through each column in my main dataframe.

The ideal solution that should work is:

dfPA[col] = dfPA[col].replace(f'(?i){valold}', key)

where
valold = ['COM', 'COMMERCIAL']
key = 'Commercial'

This doesn't work. Maybe because valold is a list. So I tried:

for val in valold:
    dfPA[col] = dfPA[col].replace(f'(?i){val}', key)

It still doesn't work. Any thoughts?

Note: I CANNOT use dfPA[col] = dfPA[col].str.replace(valold, key, case=False, regex=False) because as explained here it will replace substrings too. And I then instead of 'Commercial' I see 'ComCom...Commercial'


Solution

  • Either str.replace or replace can be used. Just make sure the pattern matches the start (^) and end ($) of the string for whole cell matches.

    str.replace:

    for val in valold:
        dfPA[col] = dfPA[col].str.replace(rf'^{val}$', key, case=False, regex=True)
    

    replace:

    for val in valold:
        dfPA[col] = dfPA[col].replace(rf'(?i)^{val}$', key, regex=True)
    

    *regex=False by default for replace so the regex case insensitivity modifier will not work for replace without setting regex=True as it will literally match the characters "(?i)".


    Sample Data and Output:

    import pandas as pd
    
    dfPA = pd.DataFrame({
        'col': ['COM', 'COMMERCIAL', 'COmMErCIaL', 'Something else',
                'comical']
    })
    
    valold = ['COM', 'COMMERCIAL']
    key = 'Commercial'
    col = 'col'
    for val in valold:
        dfPA[col] = dfPA[col].str.replace(rf'^{val}$', key, case=False, regex=True)
    
    print(dfPA)
    
                  col
    0      Commercial
    1      Commercial
    2      Commercial
    3  Something else
    4         comical