Search code examples
sql-servert-sqlfrequencyquantile

Calculate frequencies over deciles


Let's say I have SQL Server tables detailing marathon races, all of them alike. Each row is a participant and one of the columns is the finish time, [dbo].[marathon_X].[finish]. Let's assume [finish] is counted in minutes (int) for simplicity.

I'm stuck and could use help figuring out a query that divides the finish times for a certain race into deciles and counts the number of participants finishing within each decile. This so I can get an idea of the frequency distribution over different time segments (which, in this case, I expect to be something other than even, or normal for that matter).

So for example if the winner in a certain race finishes after 130 min and the last participant after 520 minutes, then in that particular race each finish time decile would be (MAX(finish) - MIN(finish))/10 = 39 min wide. And then 1st decile would be finish times within 130-168 min, 2nd would be 169-207 min, etc. I then need a count of the number of runners in each decile, the range of which will vary from race to race.

Probably stupidly simple but I haven't figured out how to NTILE this (or equivalent).


Solution

  • Not sure if you were looking to aggregate the final results, but that would be a small matter in the final select.

    The cte will create the desired ranges. Then it becomes a small matter of joining the two sets of data.

    Example

    Declare @YourTable table (Participant int, FinishTime int)
    Insert Into @YourTable values
     (1,130)  -- Top 
    ,(2,510)  -- Last
    ,(3,150)  -- Random ...
    ,(4,255)
    ,(5,460)
    ,(6,325)
    
    ;with cte as (
          Select Decile
                ,R1  = (MinV) + ( RngV * (Decile-1)) + (Decile-1)
                ,R2  = (MinV) + ( RngV * (Decile-1)) + (Decile-1) + RngV
           From  (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10) ) A(Decile)
           Cross Join ( Select MinV = min(FinishTime)
                              ,MaxV = max(FinishTime)
                              ,RngV = (max(FinishTime)-min(FinishTime))/10  
                         From  @YourTable ) B
    )
    Select A.*
          ,B.Decile
     From  @YourTable A
     Join  cte B on FinishTime between R1 and R2
     Order By FinishTime
    

    Results

    Participant FinishTime  Decile
    1           130         1
    3           150         1
    4           255         4
    6           325         6
    5           460         9
    2           510         10