Search code examples
sql-serverrank

Ranking only the observations over a certain value


I have the following table:

DROP TABLE IF EXISTS #df

CREATE TABLE #df 
(
    CommID VARCHAR(10),
    ProvID VARCHAR(5),
    VisitCount int,
    [% Score] INT,
    TimePeriod VARCHAR(10),
    Median_VisitCount INT,
    Average_VisitCount INT
);

INSERT INTO #df (CommID, ProvID, VisitCount, [% Score], TimePeriod, Median_VisitCount, Average_VisitCount)
VALUES
('AB345', '001', '65', .45, 'ThisYear', 48.5, 42),
('AB345', '002', '67', .64, 'ThisYear', 48.5, 42),
('AB345', '003', '32', .78, 'ThisYear', 48.5, 42),
('AB345', '004', '4', .32, 'ThisYear', 48.5, 42),
('AB345', '001', '23', .45, 'LastYear', 42.5, 41),
('AB345', '002', '56', .64, 'LastYear', 48.5, 41),
('AB345', '003', '31', .78, 'LastYear', 48.5, 41),
('AB345', '004', '54', .32, 'LastYear', 48.5, 41)

SELECT * FROM #df

And would like my final output to be like this:

DROP TABLE IF EXISTS #final

CREATE TABLE #final 
(
    CommID VARCHAR(10),
    ProvID VARCHAR(5),
    VisitCount int,
    [% Score] INT,
    TimePeriod VARCHAR(10),
    Median_VisitCount INT,
    Average_VisitCount INT,
    Highest INT,
    Lowest INT
);

INSERT INTO #final (CommID, ProvID, VisitCount, [% Score], TimePeriod, Median_VisitCount, Average_VisitCount, Highest, Lowest)
VALUES
('AB345', '001', '65', .45, 'ThisYear', 48.5, 42, 3, 1),
('AB345', '002', '67', .64, 'ThisYear', 48.5, 42, 2, 2),
('AB345', '003', '32', .78, 'ThisYear', 48.5, 42, 1, 3),
('AB345', '004', '4', .32, 'ThisYear', 48.5, 42, NULL, NULL),
('AB345', '001', '23', .45, 'LastYear', 42.5, 41, NULL, NULL),
('AB345', '002', '56', .64, 'LastYear', 48.5, 41, 1, 2),
('AB345', '003', '31', .78, 'LastYear', 48.5, 41, NULL, NULL),
('AB345', '004', '54', .32, 'LastYear', 48.5, 41, 2, 1)

SELECT * FROM #final

For a given CommID and TimePeriod, I would like to rank the [% Score] but only for [VisitCounts] >= Average_VisitCount.

This is the code that I have written but the ranking is still considering the values that are below the Average_VisitCount. I would like any rows with visitcounts less than the AverageVisit count to not be considered in the ranking:

SELECT a.CommID
     , a.ProvID
     , a.VisitCount
     , a.[% Score]
     , CASE WHEN VisitCount >= a.Average_VisitCount 
            THEN RANK() OVER (PARTITION BY a.CommID, TimePeriod ORDER BY [a].[% Score] DESC) 
            ELSE NULL END AS Highest
     , CASE WHEN VisitCount >= a.Average_VisitCount 
            THEN RANK() OVER (PARTITION BY a.CommID, TimePeriod ORDER BY [a].[% Score]) 
            ELSE NULL END AS Lowest
     , a.TimePeriod
     , a.Median_VisitCount
     , a.Average_VisitCount 
FROM #df a 
ORDER BY a.CommID, a.TimePeriod, a.VisitCount DESC

Solution

  • Your demo data specified [% Score] as an INT, so I changed it to DECIMAL(5,2), otherwise every row scored 0.

    You're most of the way there. Basically, rank those that fall out of your range at the bottom so they don't interfere with the ranking you're interested in, and either leave them at the bottom of the ranking, or cause them to display a NULL:

    SELECT *, CASE WHEN VisitCount >= Average_VisitCount THEN DENSE_RANK() OVER (PARTITION BY CommID, TimePeriod ORDER BY CASE WHEN VisitCount >= Average_VisitCount THEN [% Score] ELSE 999 END) END
      FROM #df
    

    The inner CASE expression ranks them using the value of 999 (some arbitrary out of range value), the outer CASE expression causes the column to return NULL for those values.

    CommID ProvID VisitCount % Score TimePeriod Median_VisitCount Average_VisitCount Ranking
    AB345 004 54 0.32 LastYear 48 41 1
    AB345 002 56 0.64 LastYear 48 41 2
    AB345 003 31 0.78 LastYear 48 41
    AB345 001 23 0.45 LastYear 42 41
    AB345 001 65 0.45 ThisYear 48 42 1
    AB345 002 67 0.64 ThisYear 48 42 2
    AB345 003 32 0.78 ThisYear 48 42
    AB345 004 4 0.32 ThisYear 48 42