I have datafarme as below.
ID COUNTRY GENDER AGE V1 V2 V3 V4 V5
1 1 1 53 APPLE apple bosck APPLE123 xApple111t
2 2 2 51 BEKO beko SIMSUNG SamsungO123 ttBeko111t
3 3 1 24 SAMSUNG bosch SEMSUNG BOSC1123 uuSAMSUNG111t
I want to replace to np.nan if there are same value in list or contain specific value. I tried below but occurred error.
remove_list = ['APPLE', 'BEKO']
remove_contain_list = ['SUNG', 'bosc']
df.iloc[:,4:].str.replace(remove_list, np.nan, case=False) # exact match & case sensitive
df.iloc[:,4:].str.contains(remove_contain_list, np.nan, case=False) # contain & case sensitive
How can I solve these problems?
You can create MultiIndex Series
by DataFrame.stack
, get masks for exact and partial matches by Series.isin
with lowercase values and Series.str.contains
, replace by Series.mask
(default value for replace is NaN
, so no necessary specify) and last Series.unstack
and assign back:
remove_list = ['APPLE', 'BEKO']
remove_contain_list = ['SUNG', 'bosc']
s = df.iloc[:,4:].stack(dropna=False)
m1 = s.str.lower().isin([x.lower() for x in remove_list])
m2 = s.str.contains('|'.join(remove_contain_list), case=False)
s = s.mask(m1 | m2)
df.iloc[:,4:] = s.unstack()
print (df)
ID COUNTRY GENDER AGE V1 V2 V3 V4 V5
0 1 1 1 53 NaN NaN NaN APPLE123 xApple111t
1 2 2 2 51 NaN NaN NaN NaN ttBeko111t
2 3 3 1 24 NaN NaN NaN NaN NaN
EDIT: You can replace mask to background color if match in Styler.apply
:
def color(x):
c1 = 'background-color: yellow'
c = ''
remove_list = ['APPLE', 'BEKO']
remove_contain_list = ['SUNG', 'bosc']
s = x.iloc[:,4:].stack(dropna=False)
m1 = s.str.lower().isin([i.lower() for i in remove_list])
m2 = s.str.contains('|'.join(remove_contain_list), case=False)
m = m1| m2
df1 = pd.DataFrame(c, index=x.index, columns=x.columns)
mask = m.unstack(fill_value=False).reindex(x.columns, fill_value=False, axis=1)
df1 = df1.mask(mask, c1)
return df1
df.style.apply(color,axis=None)