Search code examples
python-3.xpandasreplacepython-re

pandas: removing multiple str from a str


I am trying to remove a list of str from a column value like:

char_lst = ['1.', '1)', '2.', '2)', '3.', '3)']  # so on with the digit format

I tried:

import re
df['X'].apply(lambda x: re.sub('|'.join(replace_char), '', re.escape(x))).astype(str)

but it gives me error:

re.error: unbalanced parenthesis at position 4


Solution

  • Use Series.str.replace:

    import re
    
    df = pd.DataFrame({'X': ['2)A', 'B', 'C', 'A', 'D', 'E', 'F', 'D', 'H', 'I1.', 'J3)']})
    
    char_lst = ['1.', '1)', '2.', '2)', '3.', '3)']
    
    df['X'] = df['X'].str.replace("|".join(re.escape(x) for x in char_lst),'', regex=True)
    print  (df)
        X
    0   A
    1   B
    2   C
    3   A
    4   D
    5   E
    6   F
    7   D
    8   H
    9   I
    10  J
    

    EDIT: If need remove numbers with . or ) after digits use:

    df['X'] = df['X'].str.replace("\d+[\.\)]",'', regex=True)