I have a dataframe that looks like this:
Date | Value.
2020-03-18 10
2020-03-19 20
2020-03-20 30
2020-03-21 25
2020-03-22 35
2020-03-23 50
2020-03-24 1900000
2020-03-25 1200000
2020-03-26 50
2020-03-27 25
2020-03-28 35
2020-03-29 50
2020-03-30 25
2020-03-31 35
...
2021-01-31 45
What is the quickest way for me to replace the outliers in value
column with prior non-outlier values?
I need to be careful not to simply replace outlier with prior number, because it would mess up if the prior value is also an outlier (as seen in 2020-03-24
).
Thanks so much for your help!
As commented, your sample data do not have outlier. However, according to your logic, you can do:
std, mean = df['Value'].agg(['std','mean'])
df['Value'] = df['Value'].where(df['Value'].between(mean-3*std, mean+3*std)).ffill()