I have a need to generate a particular report from some data and I am having a great deal of trouble figuring out the proper usage of PERCENTILE_CONT to provide the results I need. I would like to include a column in my query result which shows what value is the 95th percentile from a range of values.
I have a table as follows:
customer_id sale_amount sale_date
1 265.75 2019-09-11 00:00:04.000
1 45.75 2019-09-10 01:00:04.000
1 2124.77 2019-09-10 04:00:04.000
1 66.99 2019-09-10 04:20:04.000
1 266.49 2019-09-09 11:20:04.000
1 3266.49 2019-09-08 11:20:04.000
Pretty straightforward.
I can run the following query, no problem:
select
min(sale_amount) as minimum_sale,
max(sale_amount) as maximum_sale,
avg(sale_amount) as average_sale
from
sales
where
customer_id = 1;
Which results in the following output:
minimum_sale maximum_sale average_sale
45.75 3266.49 1006.040000
What I'm after is a fourth column, perc_95, which would calculate what value represents the 95th percentile in terms of sale_amount.
This works to get me the value:
select distinct
customer_id,
percentile_cont(0.95) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id) as perc_95
from
sales;
Output:
customer_id perc_95
1 2981.06
But I can't seem to combine them - this fails:
select distinct
(customer id),
min(sale_amount) as minimum_sale,
max(sale_amount) as maximum_sale,
avg(sale_amount) as average_sale,
percentile_cont(0.95) WITHIN GROUP (order by sale_amount) OVER (partition by customer_id) as perc_95
from
sales
where
customer_id = 1;
Output:
Column 'sales.customer_id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I generally understand what this error means but I am having trouble figuring out how to deal with it in this context.
My desired output:
customer_id minimum_sale maximum_sale average_sale perc_95
1 45.75 3266.49 1006.040000 2981.06
Use window functions:
select distinct customer_id,
min(sale_amount) over (partition by customer_id) as minimum_sale,
max(sale_amount) over (partition by customer_id) as maximum_sale,
avg(sale_amount) over (partition by customer_id) as average_sale,
percentile_cont(0.95) within group (order by sale_amount) over (partition by customer_id) as perc_95
from sales
where customer_id = 1;
It is highly inconvenient that SQL Server does not support functions such as percentile_cont()
as aggregation functions, requiring people to use select distinct
for aggregation.