Search code examples
pythonpandasregex

Using regex matched groups in pandas dataframe replace function


I'm just learning python/pandas and like how powerful and concise it is.

During data cleaning I want to use replace on a column in a dataframe with regex but I want to reinsert parts of the match (groups).

Simple Example: lastname, firstname -> firstname lastname

I tried something like the following (actual case is more complex so excuse the simple regex):

df['Col1'].replace({'([A-Za-z])+, ([A-Za-z]+)' : '\2 \1'}, inplace=True, regex=True)

However, this results in empty values. The match part works as expected, but the value part doesn't. I guess this could be achieved by some splitting and merging, but I am looking for a general answer as to whether the regex group can be used in replace.


Solution

  • I think you have a few issues with the RegEx's.

    As @Abdou just said use either '\\2 \\1' or better r'\2 \1', as '\1' is a symbol with ASCII code 1

    Your solution should work if you will use correct RegEx's:

    In [193]: df
    Out[193]:
                  name
    0        John, Doe
    1  Max, Mustermann
    
    In [194]: df.name.replace({r'(\w+),\s+(\w+)' : r'\2 \1'}, regex=True)
    Out[194]:
    0          Doe John
    1    Mustermann Max
    Name: name, dtype: object
    
    In [195]: df.name.replace({r'(\w+),\s+(\w+)' : r'\2 \1', 'Max':'Fritz'}, regex=True)
    Out[195]:
    0            Doe John
    1    Mustermann Fritz
    Name: name, dtype: object