Search code examples
pandascontainsdrop

Remove records from pandas Dataframe subject to condition


I have created the following pandas dataframe:

import pandas as pd

ds = {'col1':['a','/','b','c'], 'col2' : [1,2,3,4]}

df = pd.DataFrame(data=ds)
print(df)

which looks like this:

  col1  col2
0    a     1
1    /     2
2    b     3
3    c     4

I have a list of special characters ¬!"£$£#/+*><@|` defined like this:

import re

chars = '¬`!"£$£#/\+*><@|'
regex = f'[{"".join(map(re.escape, chars))}]'

From the dataframe above, I need to remove only the records for which col1 contains any of the special characters included in the regex.

From the example above, the resulting dataframe should look like this:

  col1  col2
0    a     1
1    b     3
2    c     4

Does anyone know how to do it?


Solution

  • You can use contains to get all rows that contain the regex and then negate:

    df[~df.col1.str.contains(regex)]
    

    Result:

      col1  col2
    0    a     1
    2    b     3
    3    c     4