Search code examples
sqloutliers

select outliers based on sigma and standard deviation in sql


The sample data is like this.

enter image description here

I want select outliers out of 4 sigma for each class. I tried

  select  value,class,AVG(value) as mean, STDEV(value)as st, size  from Data
 having value<mean-2*st OR value>mean+2*st group by calss

it seems does not work. Should I use having or where clause here?

The results I want is the whole 3rd row and 8th row.


Solution

  • When the condition you are looking at is a property of the row, use where i.e. where class = 1 (all rows with class 1) or where size > 2 (all rows with size > 2). When the condition is a property of a set of rows you use group by ... having e.g. group by class having avg(value) > 2 (all classes with average value > 2).

    In this case you want where but there is a complication. You don't have enough information in each row alone to write the necessary where clause, so you will have to get it through a subquery.

    Ultimately you want something like SELECT value, class, size FROM Data WHERE value < mean - 2 *st OR value > mean + 2*st; however you need a subquery to get mean and st.

    One way to do this is:

    SELECT value, Data.class, size, mean, st FROM Data, 
    INNER JOIN (
      SELECT class, AVG(value) AS mean, STDEV(value) AS st 
      FROM Data GROUP BY class
    ) AS stats ON stats.class = Data.class
    WHERE value < mean - 2 * st OR value > mean + 2 * st;
    

    This creates a subquery which gets your means and standard deviations for each class, joins those numbers to the rows with matching classes, and then applies your outlier check.