Search code examples
pythondataframefunctionoutliersfillna

find in datfarame outliers and fill with nan python


I am trying to make a function to spot the columns with "100" in the header and replace all values in these columns that are above 100 with nan values :

import pandas as pd
    
data = {'first_100':  ['25', '1568200', '5'],
            'second_column': ['first_value', 'second_value', 'third_value'],
             'third_100':['89', '9', '589'],
            'fourth_column':['first_value', 'second_value', 'third_value'],
            }
    
df = pd.DataFrame(data)

print (df)

enter image description here

so this is the output I am looking for

enter image description here


Solution

  • Use filter to identify the columns with '100', to_numeric to ensure having numeric values, then mask with a boolean array:

    cols = df.filter(like='100').columns
    
    df[cols] = df[cols].mask(df[cols].apply(pd.to_numeric, errors='coerce').gt(100))
    

    output:

      first_100 second_column third_100 fourth_column
    0        25   first_value        89   first_value
    1       NaN  second_value         9  second_value
    2         5   third_value       NaN   third_value