Search code examples
sqlsql-serverpercentile-cont

How can I include a PERCENTILE_CONT column within a select statement without generating an error about the ORDER BY clause or aggregate function?


I have a need to generate a particular report from some data and I am having a great deal of trouble figuring out the proper usage of PERCENTILE_CONT to provide the results I need. I would like to include a column in my query result which shows what value is the 95th percentile from a range of values.

I have a table as follows:

customer_id sale_amount sale_date
1   265.75  2019-09-11 00:00:04.000
1   45.75   2019-09-10 01:00:04.000
1   2124.77 2019-09-10 04:00:04.000
1   66.99   2019-09-10 04:20:04.000
1   266.49  2019-09-09 11:20:04.000
1   3266.49 2019-09-08 11:20:04.000

Pretty straightforward.

I can run the following query, no problem:

select
    min(sale_amount) as minimum_sale,
    max(sale_amount) as maximum_sale,
    avg(sale_amount) as average_sale
from
    sales
where
    customer_id = 1;

Which results in the following output:

minimum_sale    maximum_sale    average_sale
45.75           3266.49     1006.040000

What I'm after is a fourth column, perc_95, which would calculate what value represents the 95th percentile in terms of sale_amount.

This works to get me the value:

select distinct
    customer_id,
    percentile_cont(0.95) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id) as perc_95
from
    sales;

Output:

customer_id perc_95
1            2981.06

But I can't seem to combine them - this fails:

select distinct
    (customer id),
    min(sale_amount) as minimum_sale,
    max(sale_amount) as maximum_sale,
    avg(sale_amount) as average_sale,
    percentile_cont(0.95) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id) as perc_95
from
    sales
where
    customer_id = 1;

Output:

Column 'sales.customer_id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

I generally understand what this error means but I am having trouble figuring out how to deal with it in this context.

My desired output:

customer_id     minimum_sale      maximum_sale  average_sale    perc_95
1                   45.75         3266.49  1006.040000     2981.06

Solution

  • Use window functions:

    select distinct customer_id,
           min(sale_amount) over (partition by customer_id) as minimum_sale, 
           max(sale_amount) over (partition by customer_id) as maximum_sale,
           avg(sale_amount) over (partition by customer_id) as average_sale,
           percentile_cont(0.95) within group (order by sale_amount)  over (partition by customer_id) as perc_95
    from sales
    where customer_id = 1;
    

    It is highly inconvenient that SQL Server does not support functions such as percentile_cont() as aggregation functions, requiring people to use select distinct for aggregation.