Search code examples
pythonpandastexttrimstrip

Trimming white spaces in similar strings in different columns of pandas dataframe


My question is very simple and I tried all possible SO answers before asking it.

I have a dataframe

{'article_journal': {0: 'lipids', 1: 'nature reviews  cardiology', 2: 'clinical research
 cardiology supplements', 3: 'jacc  basic to translational science', 4: 'clinical research 
 cardiology supplements', 5: 'diabetes  obesity   metabolism', 6: 'journal'}, 'Title': {0: 
 'chemistry physics lipids', 1: 'nature reviews cardiology', 2: 'clinical research 
 cardiology', 3: 'jacc basic to translational science', 4: 'clinical research cardiology', 5: 
 'diabetes obesity   metabolism', 6: 'journal'}}

There are extra white spaces in almost all strings. I checked it in Excel.

I print only those where values are identical

df[df.apply(pd.Series.nunique, axis = 1) == 1]

and receive only one row, which I created myself for testing purposes

    article_journal Title
6   journal         journal

How to trim white spaces in similar strings?

Expected output

    article_journal                     Title
1   nature reviews cardiology           nature reviews cardiology
3   jacc basic to translational science jacc basic to translational science
5   diabetes obesity metabolism         diabetes obesity metabolism
6   journal                             journal

I tried all possible questions with str.strip(), but nothing works. Columns are objects. What am I missing? Thanks!


Solution

  • You can replace the double space with a single space in the entire dataframe and also trim the leading and trailing space(s) if they exist:

    df = df.replace('\s+',' ', regex=True) # remove double spacing in strings
    df = df.replace('^\s+|\s+$','', regex=True) # remove leading or trailing spaces