Search code examples
sqlsql-serverexceldatabasemedian

Recreate Moving Median and Moving Mode Excel formula in SQL


I am trying to recreate the below Excel formula/table and have True/False display but am stuck.

Excel Formula: =ABS(ROUND(MEDIAN(C$2:C2),0)-ROUND(MODE.SNGL(C$2:C2),0))<[sample.xlsx]variables!$B$2

I have 200+ rows of data and I need the Median and the Mode only calculated from the first row to the current row. I can create the Median for all rows in SQL, but this is incorrect for what I need. Same with the Mode. The formula above would be in Excel cell D2 and populated down. The variable it is calling at the end of the formula is just the number 4.

Any suggestions or pointers would be great. Thanks!

Excel snippet: Quick view of how this table looks in Excel

SQL code to build this exact table in SQL.

    IF NOT EXISTS (
    select * from sysobjects where name='SampleExample' and xtype='U'
) CREATE TABLE SampleExample (
    [Seconds] INT,
    [Sequence] INT,
    [Value] NUMERIC(12, 9),
    [Result] NVARCHAR(4)
);
INSERT INTO SampleExample VALUES
    (598,1,236.888453364,N'#N/A'),
    (740,2,236.888453364,N'True'),
    (885,3,235.463708639,N'True'),
    (1024,4,236.177295446,N'True'),
    (1189,5,236.177295446,N'True'),
    (1330,6,236.866638064,N'True'),
    (1463,7,236.177295446,N'True'),
    (1599,8,236.866638064,N'True'),
    (1735,9,236.866638064,N'True'),
    (1863,10,236.866638064,N'True'),
    (1986,11,236.866638064,N'True'),
    (2110,12,236.866638064,N'True'),
    (2235,13,236.880749464,N'True'),
    (2362,14,236.908763647,N'True'),
    (2487,15,236.908763647,N'True'),
    (2610,16,236.908763647,N'True'),
    (2739,17,237.190827727,N'True'),
    (2865,18,237.190827727,N'True'),
    (3008,19,237.190827727,N'True'),
    (3132,20,237.190827727,N'True');

Current Median Query. I added a column in my SQL table called Filename that is the same value for all rows. But this finds the Median for all rows in the tablet not row 1 through current row.

Declare @Median AS INT
Select @Median = ( 
    (Select MAX([Value])
    FROM 
        (Select TOP 50 PERCENT [Value], [Filename] 
        FROM SampleExample 
        
        Order by [Filename]) as BOTTOMHALF)
    + 
    (Select MIN([Value])
    FROM
        (Select TOP 50 PERCENT [Value], [Filename]
        FROM SampleExample 
        
        Order by [Filename] desc) as TOPHALF) ) / 2 

Current Mode Query:

Declare @Mode as INT
Select @Mode = (
    Select TOP 1 ROUND([Value],0) as MODE
    from SampleExample 
    Group by [Value]
    Order by COUNT(*) DESC
    )

Result I am looking for is True/False. I'm using CASE in my SQL query:

CASE WHEN @Variable > @Median - @Mode THEN 'True' ELSE 'False' END AS Result

Solution

  • SQL Server (and SQL in general) has a function to calculate medians. It has the intuitive name percentile_cont(). And, it only exists as a window function, not an aggregation function.

    You want a running median. Ideally, it would be nice to write:

    select se.*,
           avg(value) over (order by sequence) as avg_value,
           percentile_cont(0.5) over (within group order by sequence) over (order by sequence)
    from sampleexample se;
    

    But cumulative medians are no supported. So, that leaves the apply option:

    select se.*, se2.*
    from sampleexample se cross apply
         (select top (1)  percentile_cont(0.5) within group (order by value) over () as median,
                 avg(value) over () as avg_value
          from sampleexample se2
          where se2.sequence <= se.sequence
         ) se2;
    

    Here is a db<>fiddle.

    EDIT:

    I really read the question as the median and average, not median and mode (wishful reading on my part). For the mode, you do need a subquery, so:

    select se.*, se2.*
    from sesampleexample se cross apply
         (select top (1) percentile_cont(0.5) within group (order by value) over () as median,
                 avg(value) over () as avg_value,
                 value as mode
          from (select se2.*, count(*) over (partition by se2.value) as value_cnt
                from sampleexample se2
                where se2.sequence <= se.sequence
               ) se2
          order by se2.value_cnt desc
         ) se2