python pandas dataframe filter special-characters

filter special characters in a dataframe

I have the following dataframe called data:

    metrics    artists

0    0.21    ['ZhanÃ©']
2    0.14    ['Mose Allison']
3    0.87    ['水柳仙']
4    0.25    ['Shel Silverstein']

Some records of the column "artists" have special characters, I want to make another df with the records that have special characters, that is, the following output:

data:

     metrics    artists

0    0.14    ['Mose Allison']
1    0.25    ['Shel Silverstein']

data2:

     metrics    artists

0    0.21    ['ZhanÃ©']
1    0.14    ['水柳仙']

use:

 data2=data.artists[data.artists.str.contains("[^a-zA-Z0-9]")]

but I get the original df,

I also tried with:

data2 = []
for x in data['artists']:
    if x is not "[^a-zA-Z0-9 ]":
         data2[x]=data[x]
    print(data2)

but it gives me the error:

KeyError: "['ZhanÃ©']"

and with:

if x is "[^ a-zA-Z0-9]"

returns empty records.

Solution

use:

data2=data.artists[data.artists.str.contains("[^a-zA-Z0-9]")]

but I get the original df,

You're missing a space in "[^a-zA-Z0-9]" which is why you're getting the original df. Tested with Python3 in a Jupyter notebook.