In my dataframe I want to replace different ways of representing something with a single consistent string. Examples:
The list of values to be replaced and the replacement value come from another dataframe and will change as I run through each column in my main dataframe.
The ideal solution that should work is:
dfPA[col] = dfPA[col].replace(f'(?i){valold}', key)
where
valold = ['COM', 'COMMERCIAL']
key = 'Commercial'
This doesn't work. Maybe because valold is a list. So I tried:
for val in valold:
dfPA[col] = dfPA[col].replace(f'(?i){val}', key)
It still doesn't work. Any thoughts?
Note: I CANNOT use dfPA[col] = dfPA[col].str.replace(valold, key, case=False, regex=False)
because as explained here it will replace substrings too. And I then instead of 'Commercial' I see 'ComCom...Commercial'
Either str.replace
or replace
can be used. Just make sure the pattern matches the start (^
) and end ($
) of the string for whole cell matches.
str.replace
:
for val in valold:
dfPA[col] = dfPA[col].str.replace(rf'^{val}$', key, case=False, regex=True)
replace
:
for val in valold:
dfPA[col] = dfPA[col].replace(rf'(?i)^{val}$', key, regex=True)
*regex=False
by default for replace
so the regex
case insensitivity modifier will not work for replace without setting regex=True
as it will literally match the characters "(?i)".
Sample Data and Output:
import pandas as pd
dfPA = pd.DataFrame({
'col': ['COM', 'COMMERCIAL', 'COmMErCIaL', 'Something else',
'comical']
})
valold = ['COM', 'COMMERCIAL']
key = 'Commercial'
col = 'col'
for val in valold:
dfPA[col] = dfPA[col].str.replace(rf'^{val}$', key, case=False, regex=True)
print(dfPA)
col
0 Commercial
1 Commercial
2 Commercial
3 Something else
4 comical