Search code examples
python-3.xpandasdataframepython-re

Replace string column endwiths specific substrings under condition of another column with Pandas


Given a dataset as follows:

   id  company name  value
0   1    Finl Corp.      7
1   2  Fund Tr Corp      6
2   3   Inc Invt Fd      5
3   4  Govt Fd Inc.      3
4   5   Trinity Inc      5

Or:

[{'id': 1, 'company name': 'Finl Corp.', 'value': 7},
 {'id': 2, 'company name': 'Fund Tr Corp', 'value': 6},
 {'id': 3, 'company name': 'Inc Invt Fd', 'value': 5},
 {'id': 4, 'company name': 'Govt Fd Inc.', 'value': 3},
 {'id': 5, 'company name': 'Trinity Inc', 'value': 5}]

I need to replace if company name column's contents endwiths ['Corp.', 'Corp', 'Inc.', 'Inc'], while at same time value is >= 5

The expected result will be:

   id  company name  value
0   1          Finl      7
1   2       Fund Tr      6
2   3   Inc Invt Fd      5
3   4  Govt Fd Inc.      3
4   5       Trinity      5

How could I acheive that in Pandas and regex?

Trial code with error: TypeError: replace() missing 1 required positional argument: 'repl'

mask = (df1['value'] >= 5)
df1.loc[mask, 'company_name_concise']= df1.loc[mask, 'company name'].str.replace(r'\bCorp.|Corp|Inc.|Inc$', regex=True)

Solution

  • You can change values in regex by add \s* for spaces with $ for end of strings:

    mask = (df1['value'] >= 5)
    
    
    L = ['Corp.', 'Corp', 'Inc.', 'Inc']
    pat = '|'.join(f'\s*{x}$' for x in L)
    
    df1.loc[mask, 'company name']= df1.loc[mask,'company name'].str.replace(pat,'',regex=True)
    
    print (df1)
       id  company name  value
    0   1          Finl      7
    1   2       Fund Tr      6
    2   3   Inc Invt Fd      5
    3   4  Govt Fd Inc.      3
    4   5       Trinity      5