Search code examples
pythondataframepython-re

How to use re.sub in pandas datafram


def not_value(x):
    if type(x) == str:
        re.sub(r'(\s+)', np.nan, x)
    else:
        pass

df_copy=df.copy()
df_copy.astype(str).applymap(lambda x: not_value(x))

I have checked the value in the dataframe is a string. But it always shows that TypeError: decoding to str: need a bytes-like object, float found. What is the problem with it?

Thank you for giving me an answer.


Solution

  • If you just want to replace values in a certain string column with np.nan, when the value of the string is all whitespace, you can do the following. You may want to edit the regular expression if it doesn't matter that it is all whitespace or not:

    import pandas as pd
    import re
    import numpy as np
    
    d = {'col1': [1, 2], 'col2': [3, 4], 'col3': ['s ', '  ']}
    
    df = pd.DataFrame(data=d)
    
    spaces = df['col3'].str.contains('^\s+$')
    df.loc[spaces, 'col3'] = np.nan
    df
    

    Result:

       col1  col2 col3
    0     1     3   s 
    1     2     4  NaN