Search code examples
sqlsql-servert-sqlmedian

Partitioned Median


I wonder if anyone can help I am trying to calculate a median which abides by the groupings in my view.

I like the following code but it just gives me the over all median in each row. I think i need to use OVER(PARTITION BY()) but i just can’t make heads nor tails of it even after furious googling and reading renowned articles like this https://sqlperformance.com/2012/08/t-sql-queries/median

 `SELECT
 YEAR(reportsubmitted) as “Year Submitted”,
 Month(reportsubmitted) as “Month Submitted”, COUNT (DISTINCT(propertyid)) as 
 “Number of Reports Submitted”, SUM([report fee]) as “Total Report Fee”,

(
(SELECT MAX([days From Audit to Submission])

FROM (SELECT TOP 50 PERCENT ([days From Audit to Submission] )

FROM vwCMnAuditorsProcessLength WHERE ReportSubmitted > ‘2017-04-01’ ORDER BY 
[days From Audit to Submission] ) AS x)

(SELECT MIN([days From Audit to Submission])

FROM (SELECT TOP 50 PERCENT [days From Audit to Submission]
FROM vwCMnAuditorsProcessLength WHERE ReportSubmitted > ‘2017-04-01’ ORDER BY 
[Report Fee] DESC) AS y)  
) / 2.0 as “Median Days”

FROM vwCMnAuditorsProcessLength
WHERE reportsubmitted >= ‘2017-04-01’

GROUP BY MONTH(reportsubmitted), YEAR(reportsubmitted)`

I did try the below as something different but it seems to be discounting a lot of data

SELECT

[MMYYYY ReportSubmitted],

[Total Report Fee],

[Number of Reports Submitted],

AVG([days from audit to submission]) as “Median days to Submission”

FROM (

SELECT [MMYYYY ReportSubmitted], [report fee], propertyid,
CAST([days from audit to submission] as decimal(5,2)) [days from audit to submission],

ROW_NUMBER() OVER(
Partition by [MMYYYY ReportSubmitted]
Order by [days from audit to submission] ASC) AS “RowASC”,

ROW_NUMBER() OVER(
Partition by [MMYYYY ReportSubmitted]
Order by [days from audit to submission] DESC) AS “RowDESC”,

SUM([report fee]) OVER(Partition by [MMYYYY ReportSubmitted] Order by [days from 
 audit to submission]) AS “Total Report Fee”,
COUNT(propertyid) OVER(Partition by [MMYYYY ReportSubmitted] Order by [days from audit to submission]) AS “Number of Reports Submitted”

FROM vwCMnAuditorsProcessLength) x

WHERE RowASC in (RowDESC,RowDESC-1,RowDESC+1)

 Group by [MMYYYY ReportSubmitted], [Total Report Fee], [Number of Reports Submitted]
Order by [MMYYYY ReportSubmitted]

If anyone has any ideas I would be really greatful


Solution

  • If you don't care about performance then the simplest way is the best:

    SELECT SalesPerson, Median = MAX(Median)
    FROM
    (
       SELECT SalesPerson,Median = PERCENTILE_CONT(0.5) WITHIN GROUP 
         (ORDER BY Amount) OVER (PARTITION BY SalesPerson)
       FROM dbo.Sales
    ) 
    AS x
    GROUP BY SalesPerson;
    

    Example from: https://sqlperformance.com/2014/02/t-sql-queries/grouped-median

    If you want even simpler method I recommend CRL function: https://stackoverflow.com/a/16719240/1903793

    It lets you to calculate median like this:

    SELECT dbo.Median(Field) FROM Table