I am trying to computer the median number of transactions in each category. A few notes (as the dataset below is a small snippet of a much larger dataset):
The data is set up like this:
| Person | Category | Transaction |
|:-------:|:--------:|:-----------:|
| PersonA | Sales | 27 |
| PersonB | Sales | 75 |
| PersonC | Sales | 87 |
| PersonD | Sales | 36 |
| PersonE | Sales | 70 |
| PersonB | Buys | 60 |
| PersonC | Buys | 92 |
| PersonD | Buys | 39 |
| PersonA | HR | 59 |
| PersonB | HR | 53 |
| PersonC | HR | 98 |
| PersonD | HR | 54 |
| PersonE | HR | 70 |
| PersonA | Other | 46 |
| PersonC | Other | 66 |
| PersonD | Other | 76 |
| PersonB | Other | 2 |
An ideal output would look like:
| Category | Median | Average |
|:--------:|:------:|:-------:|
| Sales | 70 | 59 |
| Buys | 60 | 64 |
| HR | 59 | 67 |
| Other | 56 | 48 |
I can get the average by:
SELECT
Category,
AVG(Transaction) AS Average_Transactions
FROM
table
GROUP BY
Category
And that works great!
This post tried to help me find the median. What I wrote was:
SELECT
Category,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM
table
GROUP BY
Category
But I get an error:
Msg 8120: Column 'Transactions' is invalid in the select list because it is not contained in either an aggregate function or the **GROUP BY** clause
How can I fix this?
You can do what you want using SELECT DISTINCT
:
SELECT DISTINCT Category,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Transaction) OVER (PARTITION BY Category) AS Median_Transactions
FROM table;
Unfortunately, SQL Server doesn't offer the PERCENTILE_
functions as window functions and doesn't have a MEDIAN()
aggregation function. You can also do this using subqueries and counts.