Search code examples
sqlsql-servert-sqlmedian

Finding Median in sql server upto every date in the table


I use below query to find the median for every sector

SELECT DISTINCT Sector,
    PERCENTILE_DISC(0.5) WITHIN
GROUP (ORDER BY Value) OVER (PARTITION BY sector) AS Median
FROM TABLE

The table is in below format

    Sector  Date    Value
    A   2014-08-01  1
    B   2014-08-01  5
    C   2014-08-01  7
    A   2014-08-02  6
    B   2014-08-02  5
    C   2014-08-02  4
    A   2014-08-03  3
    B   2014-08-03  9
    C   2014-08-03  6
    A   2014-08-04  5
    B   2014-08-04  8
    C   2014-08-04  9
    A   2014-08-05  5
    B   2014-08-05  7
    C   2014-08-05  2   

So I get the expected result as below

    Sector  Median
    A   5
    B   7
    C   6

Now I need to change the process such that the Medians are calculated while only considering the records upto the given date. So the new result would be

    Sector  Date    Value
    A   2014-08-01  1
    B   2014-08-01  5
    C   2014-08-01  7 (Only 1 record each was considered for A, B and C) 

    A   2014-08-02  3.5
    B   2014-08-02  5
    C   2014-08-02  5.5 (2 records each was considered for A, B and C)

    A   2014-08-03  3
    B   2014-08-03  5
    C   2014-08-03  6 (3 records each was considered for A, B and C)

    A   2014-08-04  4
    B   2014-08-04  6.5
    C   2014-08-04  6.5 (4 records each was considered for A, B and C)

    A   2014-08-05  5
    B   2014-08-05  7
    C   2014-08-05  6 (All 5 records each was considered for A, B and C) 

So this will be sort of a cumulative median. Can someone please tell me how to achieve this. My table has about 2.3M records with about 1100 records each for about 1100 dates.

Please let me know if you need any info.


Solution

  • That makes it harder, because the following does not work:

    SELECT DISTINCT Sector, Date,
           PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Value) OVER (PARTITION BY sector ORDER BY DATE) AS Median
    FROM TABLE;
    

    Alas. You can use cross apply for this purpose:

    select t.sector, t.date, t.value, m.median
    from table t cross apply
         (select top 1 PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY t2.Value) OVER (PARTITION BY sector ORDER BY t2.DATE) AS Median
          from table t2
          where t2.sector = t.sector and t2.date <= t.date
         ) m;