The problem that I am facing is how i can reject a window of 10 rows if one or many of the rows consist of an outlier while computing rolling average using python pandas?
For clarification:
df = df['speed'].rolling(10).mean()
outlier_lower_bound = 0
outlier_upper_bound = 15
df.max()
Now how do I reject/ not consider the average value of those 10 period window if it consists an outlier?
The end goal is to get the max moving average without accounting/ considering the window of 10 period if it contains an outlier Thanks in advance!
You can do fix your issue in just one line like so:
_filter = lambda x: float("inf") if x > outlier_upper_bound or x < outlier_lower_bound else x
df["speed"].apply(_filter).rolling(10).mean().dropna()
The idea behind my code can be understood in these steps:
_filter
that converts any value outside your boundaries into inf
.mean
over a window that has inf
in it, the result will be Nan
.Nan
values which will mimic the same effect.