Search code examples
pythonpandasdataframeoutliers

Python Pandas: How to remove the outliers in a column, and replace them with prior values (assuming they are not outlier)?


I have a dataframe that looks like this:

Date | Value. 
2020-03-18 10 
2020-03-19 20 
2020-03-20 30 
2020-03-21 25 
2020-03-22 35 
2020-03-23 50 
2020-03-24 1900000 
2020-03-25 1200000 
2020-03-26 50 
2020-03-27 25 
2020-03-28 35 
2020-03-29 50 
2020-03-30 25 
2020-03-31 35 
... 
2021-01-31 45 

What is the quickest way for me to replace the outliers in value column with prior non-outlier values?

I need to be careful not to simply replace outlier with prior number, because it would mess up if the prior value is also an outlier (as seen in 2020-03-24).

Thanks so much for your help!


Solution

  • As commented, your sample data do not have outlier. However, according to your logic, you can do:

    std, mean = df['Value'].agg(['std','mean'])
    
    df['Value'] = df['Value'].where(df['Value'].between(mean-3*std, mean+3*std)).ffill()