Given a dataset as follows:
id company name value
0 1 Finl Corp. 7
1 2 Fund Tr Corp 6
2 3 Inc Invt Fd 5
3 4 Govt Fd Inc. 3
4 5 Trinity Inc 5
Or:
[{'id': 1, 'company name': 'Finl Corp.', 'value': 7},
{'id': 2, 'company name': 'Fund Tr Corp', 'value': 6},
{'id': 3, 'company name': 'Inc Invt Fd', 'value': 5},
{'id': 4, 'company name': 'Govt Fd Inc.', 'value': 3},
{'id': 5, 'company name': 'Trinity Inc', 'value': 5}]
I need to replace if company name
column's contents endwiths ['Corp.', 'Corp', 'Inc.', 'Inc']
, while at same time value
is >= 5
The expected result will be:
id company name value
0 1 Finl 7
1 2 Fund Tr 6
2 3 Inc Invt Fd 5
3 4 Govt Fd Inc. 3
4 5 Trinity 5
How could I acheive that in Pandas and regex?
Trial code with error: TypeError: replace() missing 1 required positional argument: 'repl'
mask = (df1['value'] >= 5)
df1.loc[mask, 'company_name_concise']= df1.loc[mask, 'company name'].str.replace(r'\bCorp.|Corp|Inc.|Inc$', regex=True)
You can change values in regex by add \s*
for spaces with $
for end of strings:
mask = (df1['value'] >= 5)
L = ['Corp.', 'Corp', 'Inc.', 'Inc']
pat = '|'.join(f'\s*{x}$' for x in L)
df1.loc[mask, 'company name']= df1.loc[mask,'company name'].str.replace(pat,'',regex=True)
print (df1)
id company name value
0 1 Finl 7
1 2 Fund Tr 6
2 3 Inc Invt Fd 5
3 4 Govt Fd Inc. 3
4 5 Trinity 5