Detect the "outliers"

In a column I have values like 0.7,0.85, 0.45, etc but also it might happen to have 2.13 which is different than the majority of the values. How can I spotted this "outliers"?

Thank you

Solution

Call scipy.stats.zscore(a) with a as a DataFrame to get a NumPy array containing the z-score of each value in a. Call numpy.abs(x) with x as the previous result to convert each element in x to its absolute value. Use the syntax (array < 3).all(axis=1) with array as the previous result to create a boolean array. Filter the original DataFrame with this result.

z_scores = stats.zscore(df)

abs_z_scores = np.abs(z_scores)
filtered_entries = (abs_z_scores < 3).all(axis=1)
new_df = df[filtered_entries]

How to write data to Redshift that is a result of a dataframe created in Python?
Keep first instance of duplicate column name, unless empty then keep second instance of column
rounding up time to last 30 mins interval
Should I Manually Patch the Pandas DataFrame.query() Vulnerability or Wait for an Official Update?
Panda Resampling is incorrect for some cases
XlsxWriter with Pandas dataframe thousand separator
How to read SharePoint Online (Office365) Excel files into Python specifically pandas with Work or School Account?
How to set a column which suffix name is based on a value in another column
How to merge dataframes over multiple columns and split rows?
How to generate a new column in the dataframe that indicates the columns with positive results?
With `pandas.cut()`, how do I get integer bins and avoid getting a negative lowest bound?
How to solve an OverflowError when exporting pandas dataframe to JSON
Scraping dynamic data table with no easy references
pandas dataframe update with filter_func
Merge two dataframes by index
how to merge two data frames based on particular column in pandas python?
Row by Row update / change of Values based on 2nd Dataframe with conditions
How to use vectorized calculations in pandas to find out where a value or category is changing with corrected first row?
AttributeError: 'Styler' object has no attribute 'style'
Converting all columns in spark df from decimal to float for pandas conversion
Python Error - IndexError: single positional indexer is out-of-bounds
VS Code Problem: "Import "pandas" could not be resolved from source"
How to slice a pandas DataFrame by position?
Changing a pandas dataframe format into another format?
Cumsum reset at NaN
Extract row with maximum value in a group pandas dataframe
Select rows with highest value from groupby
How can I get Rows which have the max value of the group to which they belong?
How to display a random sample from a styled DataFrame?
How to improve code performance ( using Google Translate API )