Search code examples
stringpandasdataframestrippunctuation

Strip punctuation from all rows and columns in Pandas DataFrame


I'm stripping punctuation from strings contained within a Pandas dataframe. For example:

import pandas as pd
df = pd.DataFrame(data = [['a.b', 'c_d', 'e^f'],['g*h', 'i@j', 'k&l']], 
                  columns = ['column 1', 'column 2', 'column 3'])

I've succeeded in stripping punctuation within a column using list comprehension:

import string
df_nopunct = [line.translate(str.maketrans('', '', string.punctuation)) 
    for line in list(df['column 1'])]

# ['ab', 'gh']

But what I'd really like to do is strip punctuation across the entire dataframe, saving this as a new dataframe.

If I try the same approach on the entire dataframe, it seems to just return a list of my column names:

df_nopunct = [line.translate(str.maketrans('', '', string.punctuation)) 
    for line in list(df)]

# ['column 1', 'column 2', 'column 3']

Should I iterate line.translate(str.maketrans('', '', string.punctuation)) across columns, or is there a simpler way to accomplish this?

I've looked at the detailed answer about how to strip punctuation but it looks like that article deals with stripping from a single string, rather than across an entire dataframe.


Solution

  • You could do direct df.replace as follows

    import string
    df_trans = df.replace('['+string.punctuation+']', '', regex=True)
    
    Out[766]:
      column 1 column 2 column 3
    0       ab       cd       ef
    1       gh       ij       kl
    

    If you prefer using translate, use dict comprehension with str.translate on each column and construct new dataframe

    import string
    trans = str.maketrans('', '', string.punctuation)
    df_trans = pd.DataFrame({col: df[col].str.translate(trans) for col in df})
    
    Out[746]:
      column 1 column 2 column 3
    0       ab       cd       ef
    1       gh       ij       kl