I am not a SQL expert, not even close. In the below example am using the PARTITION BY
clause in a PERCENT_RANK()
operation (Microsoft SQL Server hosted in Azure) to group my data for ranking which works as expected:
DECLARE @Dinky TABLE
(
[Id] INT PRIMARY KEY NOT NULL,
[Name] nvarchar(32) NOT NULL,
[Score] float NOT NULL,
[Type] nvarchar(4) NOT NULL,
[Pool] int NOT NULL
);
INSERT INTO @Dinky ([Id], [Name], [Score], [Type], [Pool]) VALUES (1, 'Bob', 1.4, 'A', 1);
INSERT INTO @Dinky ([Id], [Name], [Score], [Type], [Pool]) VALUES (2, 'Mary', 2.6, 'A', 2);
INSERT INTO @Dinky ([Id], [Name], [Score], [Type], [Pool]) VALUES (3, 'Kim', 7.8, 'A', 3);
INSERT INTO @Dinky ([Id], [Name], [Score], [Type], [Pool]) VALUES (4, 'Chris', 3.3, 'A', 1);
INSERT INTO @Dinky ([Id], [Name], [Score], [Type], [Pool]) VALUES (5, 'Linda', 4.5, 'A', 1);
INSERT INTO @Dinky ([Id], [Name], [Score], [Type], [Pool]) VALUES (6, 'Frank', 2.1, 'A', 2);
INSERT INTO @Dinky ([Id], [Name], [Score], [Type], [Pool]) VALUES (7, 'Julie', 1.0, 'A', 3);
INSERT INTO @Dinky ([Id], [Name], [Score], [Type], [Pool]) VALUES (8, 'Greg', 6.7, 'A', 3);
INSERT INTO @Dinky ([Id], [Name], [Score], [Type], [Pool]) VALUES (9, 'Tom', 8.3, 'A', 2);
SELECT [Id]
,[Name]
,[Score]
,[Type]
,[Pool]
,PERCENT_RANK() OVER (PARTITION BY [Type], [Pool] ORDER BY [Score] DESC) [Rank]
FROM @Dinky
This will produce the expected output with my rows being partitioned first by their [Type] (which I realize are all the same in this example) and then by their respective [Pool]:
Id Name Score Type Pool Rank
5 Linda 4.5 A 1 0
4 Chris 3.3 A 1 0.5
1 Bob 1.4 A 1 1
9 Tom 8.3 A 2 0
2 Mary 2.6 A 2 0.5
6 Frank 2.1 A 2 1
3 Kim 7.8 A 3 0
8 Greg 6.7 A 3 0.5
7 Julie 1 A 3 1
So far, so good - but of course it can't be that simple. In my particular business case, the rows in Pool 1 should be ranked along side all the rows in Pools 2 and 3 (think of it like a super-pool of everyone); further, the rows in Pool 2 should be ranked along with the rows in Pool 3 (a more exclusive pool); and of course the rows in Pool 3 are ranked only among themselves (the elite). Putting aside the question of 'how to express this' for a moment, the other part of the problem is that of duplicate rows. I only want to retain the rank result for the highest pool for each row. So in this example, Greg would be used for ranking in the 1-2-3 pool, and used for ranking in the 2-3 pool, and used for ranking in the 3 pool - but I only want his result for the 3 pool in my output.
After some fiddling around I was eventually able to get my desired result, but (I think) the result is far from elegant. I can't help but feel there is a better way to express this in SQL that I'm simply not fluent enough to produce. Here is what I came up with:
WITH poolOne AS
(
SELECT [Id]
,[Name]
,[Score]
,[Type]
,[Pool]
,PERCENT_RANK() OVER (PARTITION BY [Type] ORDER BY [Score] DESC) [Rank]
FROM @Dinky
WHERE [Pool] >= 1
), poolTwo AS
(
SELECT [Id]
,[Name]
,[Score]
,[Type]
,[Pool]
,PERCENT_RANK() OVER (PARTITION BY [Type] ORDER BY [Score] DESC) [Rank]
FROM @Dinky
WHERE [Pool] >= 2
), poolThree AS
(
SELECT [Id]
,[Name]
,[Score]
,[Type]
,[Pool]
,PERCENT_RANK() OVER (PARTITION BY [Type] ORDER BY [Score] DESC) [Rank]
FROM @Dinky
WHERE [Pool] >= 3
)
SELECT *
FROM poolThree
UNION
SELECT *
FROM poolTwo
WHERE NOT EXISTS (SELECT 1 FROM poolThree
WHERE poolTwo.Id = poolThree.Id)
UNION
SELECT *
FROM poolOne
WHERE NOT EXISTS (SELECT 1 FROM poolTwo
WHERE poolOne.Id = poolTwo.Id)
I get this output:
Id Name Score Type Pool Rank
5 Linda 4.5 A 1 0.375
4 Chris 3.3 A 1 0.5
1 Bob 1.4 A 1 0.875
9 Tom 8.3 A 2 0
2 Mary 2.6 A 2 0.6
6 Frank 2.1 A 2 0.8
3 Kim 7.8 A 3 0
8 Greg 6.7 A 3 0.5
7 Julie 1 A 3 1
Notice Kim, Greg and Julie are the same as they were in my original query because they are exclusively ranked within their pool (Pool 3). But the ranking for Tom, Mary and Frank have slightly changed because they were ranked with their pool (Pool 2) and the more exclusive pool (Pool 3). Finally, the rankings for Linda, Chris and Bob were ranked against everyone (Pools 1, 2 and 3).
I realize in my example above I could rewrite the WITH poolOne to remove the WHERE clause, and rewrite the WITH pool3 to remove the WHERE and change the PARTITION BY to it's original form. But for what it is (all things being equal), I think the logic is more clearly expressed the way it is (and is certainly better for an example asking for help).
What could I do to rewrite this in a more elegant, shorter, better, more performant, (pick your adjective) way? Can someone please dazzle me with their sql-foo?
I don't know if there's a way to avoid using PERCENT_RANK()
multiple times, but one way to do achieve this result without the unions would be with a CASE
expression.
SELECT [Id]
,[Name]
,[Score]
,[Type]
,[Pool]
,[Rank] =
CASE [Pool]
WHEN 3 THEN PERCENT_RANK() OVER (PARTITION BY [Type], [Pool] ORDER BY [Score] DESC)
WHEN 2 THEN PERCENT_RANK() OVER (PARTITION BY [Type], CASE WHEN [Pool] >= 2 THEN 1 END ORDER BY [Score] DESC)
WHEN 1 THEN PERCENT_RANK() OVER (PARTITION BY [Type] ORDER BY [Score] DESC)
END
FROM @Dinky
ORDER BY [Type], [Pool], [Score] DESC;