Search code examples
pythonpandasstringdataframetrim

Cannot remove spaces or trim spaces from column pandas


I'm stuck in simple task. I have a test dataframe with spaces in it. In order to remove them I did following:

df_unique['final'] = df_unique['final'].astype("string")
df_unique['final'] = df_unique['final'].str.strip()
df_unique['final'] = df_unique['final'].str.replace(' ', '')

But still:

df_unique = 

final
+123 123
+123 123 123
+12345 123

df_unique.info() show the column as String.


I think it is not working for DOUBLE spaces numbers. Idk maybe this information will help you


Solution

  • Considering that the dataframe is called df and looks like the following

             final
    0      123 123
    1  123 123 123
    2    12345 123
    

    Assuming that the goal is to create a new column, let's call it new, and store the values of the column final, but without the spaces, one can create a custom lambda function using re as follows

    import re
    
    df['new'] = df['final'].apply(lambda x: re.sub(r'\s', '', x))
    
    [Out]:
             final        new
    0      123 123     123123
    1  123 123 123  123123123
    2    12345 123   12345123
    

    If one wants to update the column final, then do the following

    df['final'] = df['final'].apply(lambda x: re.sub(r'\s', '', x))
    
    [Out]:
      
           final
    0     123123
    1  123123123
    2   12345123
    

    Another option for this last use case would be using pandas.Series.str.replace as

    df['final'] = df['final'].str.replace(r'\s', '', regex=True)
    
    [Out]:
    
           final
    0     123123
    1  123123123
    2   12345123
    

    Note:

    • One needs to pass regex=True, else one will get

    FutureWarning: The default value of regex will change from True to False in a future version