Search code examples
sqlsql-servercountaveragegreatest-n-per-group

Return the mean of a grouped value along with the mean of the top n% of that value in the same query?


I need to write one query that returns both the average value of fields in a group as well as the average of the top 33% of the values of those fields in a group.

UserId | Sequence | Value | Value2
-------|----------|-------|-------
1      | 1        | 5     | 0
1      | 2        | 10    | 15
1      | 3        | 15    | 20
1      | 4        | NULL  | 25
1      | 5        | NULL  | 30
1      | 6        | NULL  | 60

The return needs to also contain the denominators used to calculate the means, I want to group by user and return something like this:

UserId | ValueMean | ValueDenom | ValueTopNMean | ValueTopNDenom | Value2Mean | Value2Denom | Value2TopNMean | Value2TopNDenom
-------|-----------|------------|---------------|----------------|------------|-------------|----------------|----------------
1      | 10        | 3          | 15            | 1              | 25         | 6           | 45             | 2

I've tried various window functions (NTILE, PERCENT_RANK, etc.), but what is tricky is I have multiple fields of values that will need to undergo this same operation, and the denominators for each Value field will vary (n% will stay the same, however). Please let me know if I've been unclear or you need more information.


Solution

  • The overall average and top value, as well as the count of non-null values, can easily be computed with aggregate functions.

    As for the average and count of top N values: you can use ntile() in a subquery to identify the relevant rows first, then use that information in conditional expressions within aggregate functions in the outer query.

    select
        userid,
        avg(value)   avg_value,
        count(value) cnt_value,
        max(value)   top_value,
        avg(case when ntile_value = 1 then value end)    avg_topn_value,
        sum(case when ntile_value = 1 then 1 else 0 end) cnt_topn_value
    from (select t.*, ntile(3) over(order by value) ntile_value from mytable t) t
    group by userid