I'm working on a problem of finding mean processing times. I'm trying to eliminate outlier data by essentially performing a average on only the best 80% of the data.
I am struggling trying to adapt existing Top N per Group solutions to perform averaging per group. Using SQL Server 2008.
Here is a sample of what the table looks like:
OpID | ProcessMin | Datestamp
2 | 234 | 2012-01-26 09:07:29.000
2 | 222 | 2012-01-26 10:04:22.000
3 | 127 | 2012-01-26 11:09:51.000
3 | 134 | 2012-01-26 05:02:11.000
3 | 566 | 2012-01-26 05:27:31.000
4 | 234 | 2012-01-26 04:08:41.000
I want it to take the lowest 80% of the ProcessMin for each OpID, and take the average of that array. Any help would be appreciated!
* UPDATE *
Given the following table:
OpID ProcessMin Datestamp
602 33 46:54.0
602 36 38:59.0
602 37 18:45.0
602 39 22:01.0
602 41 36:43.0
602 42 33:00.0
602 49 03:48.0
602 51 22:08.0
602 69 39:15.0
602 105 59:56.0
603 13 34:07.0
603 18 07:17.0
603 31 57:07.0
603 39 01:52.0
603 39 01:02.0
603 40 40:10.0
603 46 22:56.0
603 47 11:03.0
603 48 40:13.0
603 56 25:01.0
I would expect this output:
OptID ProcessMin
602 41
603 34.125
Notice that since there are 10 data points for each OpID, it would only average the lowest 8 values (80%).
You can use ntile
select OpID,
avg(ProcessMin) as ProcessMin
from
(
select OpID,
ProcessMin,
ntile(5) over(partition by OpID order by ProcessMin) as nt
from YourTable
) as T
where nt <= 4
group by OpID
If ProcessMin is an integer
you can do avg(cast(ProcessMin as float)) as ProcessMin
to get the decimal average value.