Search code examples
pythonpandasregexdataframe

Regex for column not producing expected output


I have this dataframe:

dfsupport = pd.DataFrame({'Date': ['8/12/2020','8/12/2020','13/1/2020','24/5/2020','31/10/2020','11/7/2020','11/7/2020'],
                          'Category': ['Table','Chair','Cushion','Table','Chair','Mats','Mats'],
                          'Sales': ['1 table','3chairs','8 cushions','3Tables','12 Chairs','12Mats','4Mats'],
                          'Paid': ['Yes','Yes','Yes','Yes','No','Yes','Yes',],
                          'Amount': ['93.78','$51.99','44.99','38.24','£29.99','29 21 only','18']
                          })

Which looks like this in table form:

          Date Category     Sales   Paid      Amount
0   8/12/2020    Table     1 table  Yes       93.78
1   8/12/2020    Chair     3chairs  Yes      $51.99
2   13/1/2020  Cushion  8 cushions  Yes       44.99
3   24/5/2020    Table     3Tables  Yes       38.24
4  31/10/2020    Chair   12 Chairs   No      £29.99
5   11/7/2020     Mats      12Mats  Yes  29 21 only
6   11/7/2020     Mats       4Mats  Yes          18

I want to remove both the string elements in the above. I've learnt to successfully replace the $ and £ with:

patternv='|'.join(re.escape(x) for x in ['$', '£'])
dfsupport['Amount'] = dfsupport['Amount'].str.replace(patternv,regex=True)

I now want to replace the entry that has "29 21 only" in the Amount column. My attempt has been:

patterns="{r'(\d{1,})\s(\d{1,2})\D+' : r'\1 \2'}"
dfsupport['Amount']=dfsupport['Amount'].str.replace(patterns,regex=True)

However my attempt leads to the error:

Traceback (most recent call last):
  File "/home/cloud/code/learning/howmany.py", line 160, in <module>
    dfsupport['Amount'] = dfsupport['Amount'].str.replace(patternv,regex=True)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cloud/.venv/lib/python3.12/site-packages/pandas/core/strings/accessor.py", line 136, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: StringMethods.replace() missing 1 required positional argument: 'repl'

How do I fix this?

I should add that I'm seeking to have the output as "29.21"

I followed the question here


Solution

  • You are missing the second (mandatory) parameter of str.replace:

    dfsupport['Amount'] = dfsupport['Amount'].str.replace(r'(\d{1,})\s(\d{1,2})\D+',
                                                          r'\1.\2', regex=True)
    

    It also seems that you're trying to use a dictionary in patterns, this is however not working the way you did it, you have to pass a single regex.

    If you want to pass a dictionary, use replace (without str.):

    patterns = {r'(\d{1,})\s(\d{1,2})\D+' : r'\1.\2'}
    dfsupport['Amount'] = dfsupport['Amount'].replace(patterns,regex=True)
    

    Output:

             Date Category       Sales Paid  Amount
    0   8/12/2020    Table     1 table  Yes   93.78
    1   8/12/2020    Chair     3chairs  Yes  $51.99
    2   13/1/2020  Cushion  8 cushions  Yes   44.99
    3   24/5/2020    Table     3Tables  Yes   38.24
    4  31/10/2020    Chair   12 Chairs   No  £29.99
    5   11/7/2020     Mats      12Mats  Yes   29.21
    6   11/7/2020     Mats       4Mats  Yes      18