Search code examples
pythonpandasdataframefilterspecial-characters

filter special characters in a dataframe


I have the following dataframe called data:

    metrics    artists

0    0.21    ['Zhané']
2    0.14    ['Mose Allison']
3    0.87    ['水柳仙']
4    0.25    ['Shel Silverstein']

Some records of the column "artists" have special characters, I want to make another df with the records that have special characters, that is, the following output:

data:

     metrics    artists

0    0.14    ['Mose Allison']
1    0.25    ['Shel Silverstein']

data2:

     metrics    artists

0    0.21    ['Zhané']
1    0.14    ['水柳仙']

use:

 data2=data.artists[data.artists.str.contains("[^a-zA-Z0-9]")]

but I get the original df,

I also tried with:

data2 = []
for x in data['artists']:
    if x is not "[^a-zA-Z0-9 ]":
         data2[x]=data[x]
    print(data2)

but it gives me the error:

KeyError: "['Zhané']"

and with:

if x is "[^ a-zA-Z0-9]"

returns empty records.


Solution

  • use:

    data2=data.artists[data.artists.str.contains("[^a-zA-Z0-9]")]

    but I get the original df,

    You're missing a space in "[^a-zA-Z0-9]" which is why you're getting the original df. Tested with Python3 in a Jupyter notebook.