Search code examples
pandasspecial-charactersdrop

How to drop pandas dataframe columns containing special characters


How do I drop pandas dataframe columns that contains special characters such as @ / ] [ } { - _ etc.?

For example I have the following dataframe (called df):

enter image description here

I need to drop the columns Name and Matchkey becasue they contain some special characters. Also, how can I specify a list of special characters based on which the columns will be dropped?

For example: I'd like to drop the columns that contain (in any record, in any cell) any of the following special characters:

listOfSpecialCharacters: ¬,`,!,",£,$,£,#,/,\


Solution

  • One option is to use a regex with str.contains and apply, then use boolean indexing to drop the columns:

    import re
    chars = '¬`!"£$£#/\\'
    regex = f'[{"".join(map(re.escape, chars))}]'
    # '[¬`!"£\\$£\\#/\\\\]'
    
    df2 = df.loc[:, ~df.apply(lambda c: c.str.contains(regex).any())]
    

    example:

    # input
         A    B    C
    0  123  12!  123
    1  abc  abc  a¬b
    
    # output
         A
    0  123
    1  abc