If I want to remove values that do not exist between -2σ and 2σ, how do I remove outliers using iqr?
I implemented this equation as follows.
iqr = df['abc'].percentile(0.75) - df['abc'].percentile(0.25)
cond1 = (df['abc'] > df['abc'].percentile(0.75) + 2 * iqr)
cond2 = (df['abc'] < df['abc'].percentile(0.25) - 2 * iqr)
df[cond1 & cond2]
Is this the right way?
This is not right. Your iqr
is almost never equal to σ. Quartiles and deviations are not the same things.
Fortunately, you can easily compute the standard deviation of a numerical Series using Series.std()
.
sigma = df['abc'].std()
cond1 = (df['abc'] > df['abc'].mean() - 2 * sigma)
cond2 = (df['abc'] < df['abc'].mean() + 2 * sigma)
df[cond1 & cond2]