Search code examples
pandasoutliers

Detect the "outliers"


In a column I have values like 0.7,0.85, 0.45, etc but also it might happen to have 2.13 which is different than the majority of the values. How can I spotted this "outliers"?

Thank you


Solution

  • Call scipy.stats.zscore(a) with a as a DataFrame to get a NumPy array containing the z-score of each value in a. Call numpy.abs(x) with x as the previous result to convert each element in x to its absolute value. Use the syntax (array < 3).all(axis=1) with array as the previous result to create a boolean array. Filter the original DataFrame with this result.

    z_scores = stats.zscore(df)
    
    abs_z_scores = np.abs(z_scores)
    filtered_entries = (abs_z_scores < 3).all(axis=1)
    new_df = df[filtered_entries]