I'm trying to identify outliers in each housing type category, but encountering an issue. Whenever I run the code, I receive the following error: "IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
grouped = df.groupby('Type')
q1 = grouped["price"].quantile(0.25)
q3 = grouped["price"].quantile(0.75)
iqr = q3 - q1
upper_bound = q3 + (1.5 * iqr)
lower_bound = q1 - (1.5 * iqr)
outliers = df[(df["price"].reset_index(drop=True) > upper_bound[df["Type"]].reset_index(drop=True)) | (df["price"].reset_index(drop=True) < lower_bound[df["Type"].reset_index(drop=True)])]
print(outliers)
When I run this part of the code
(df["price"].reset_index(drop=True) > upper_bound[df["Type"]].reset_index(drop=True)).reset_index(drop = True)
I'm getting boolean Series, but when I put it in the df[] it breaks.
Use transform
to compute q1
/q3
, this will maintain the original index:
q1 = grouped["price"].transform(lambda x: x.quantile(0.25))
q3 = grouped["price"].transform(lambda x: x.quantile(0.75))
iqr = q3 - q1
upper_bound = q3 + (1.5 * iqr)
lower_bound = q1 - (1.5 * iqr)
outliers = df[df["price"].gt(upper_bound) | df["price"].lt(lower_bound)]