I am working with student data in MS SQL and have some very specific rules to follow.
CREATE TABLE students (
encounterId INT,
studentId INT,
positionId INT
);
INSERT INTO students
VALUES
(100,20,1),
(100,32,2),
(100,14,2),
(101,18,1),
(101,87,2),
(101,78,3),
(102,67,2),
(102,20,2),
(103,33,3),
(103,78,4),
(104,16,1),
(104,18,4),
(105,67,4),
(105,18,4),
(105,20,4);
The table shows student encounters where students are placed in a position between 1 and 4.
There can be multiple students in an encounter.
There can be only one student in position 1 in an encounter.
There can be only one student in position 3 in an encounter.
However, multiple students can be in positions 2 and 4 in an encounter.
The business rule is as follows for each encounter:
Concatenation of the studentId value is acceptable, but not ideal. I've got this semi-working with a wonky series of unions and string_aggs. The rows with positionId=3 are problematic, as I've remarked in the code.
Additionally, this union/not like architecture works in my small development DB but will have serious performance issues in production DBs:
WITH tAll
AS ( SELECT
encounterId,
studentId,
positionId
FROM
students)
SELECT
encounterId,
CAST(studentId AS VARCHAR) AS [studentId],
1 AS [ord]
FROM
tAll
WHERE
positionId = 1
UNION
SELECT
encounterId,
CAST(studentId AS VARCHAR),
2 AS [ord]
FROM
(
SELECT
encounterId,
STRING_AGG(studentId, ',') AS [studentId],
STRING_AGG(positionId, ',') AS [positionId]
FROM
tAll
GROUP BY
encounterId
) t2
WHERE
positionId NOT LIKE '%1%'
AND positionId NOT LIKE '%3%'
AND positionId NOT LIKE '%4%'
UNION
SELECT
encounterId,
CAST(studentId AS VARCHAR),
3 AS [ord]
FROM
--tAll WHERE positionId=3
--Limiting to positionId=3 includes results (101,18,1) AND (101,78,3).. I just want (101,18,1)
--Using the below code instead, but this creates other problems
(
SELECT
encounterId,
STRING_AGG(studentId, ',') AS [studentId],
STRING_AGG(positionId, ',') AS [positionId]
FROM
tAll
GROUP BY
encounterId
) t3
WHERE
positionId NOT LIKE '%1%'
AND positionId NOT LIKE '%2%'
AND positionId NOT LIKE '%4%'
--This excludes 103 entirely since it has both positionId values of 3 AND 4... I just want (103,33,3)
UNION
SELECT
encounterId,
CAST(studentId AS VARCHAR),
4 AS [ord]
FROM
(
SELECT
encounterId,
STRING_AGG(studentId, ',') AS [studentId],
STRING_AGG(positionId, ',') AS [positionId]
FROM
tAll
GROUP BY
encounterId
) t4
WHERE
positionId NOT LIKE '%1%'
AND positionId NOT LIKE '%2%'
AND positionId NOT LIKE '%3%';
encounterId | studentId | ord |
---|---|---|
100 | 20 | 1 |
101 | 18 | 1 |
102 | 67 | 2 |
102 | 20 | 2 |
103 | 33 | 3 |
104 | 16 | 1 |
105 | 67 | 4 |
105 | 18 | 4 |
105 | 20 | 4 |
This is a top-1-per-group problem... with ties.
You can use window function to rank()
in a subquery to rank students within each encounter, then filter on the top record(s) per group in the outer query:
select *
from (
select s.*,
rank() over(partition by encounterid order by positionid) rn
from students s
) s
where rn = 1
order by encounterid
Another option uses with ties
- but you can't control the ordering of the rows in the resultset:
select top (1) with ties *
from students s
order by rank() over(partition by encounterid order by positionid)
Another typical solution is to filter with a correlated subquery:
select *
from students s
where positionid = (select min(s1.positionid) from students s1 where s1.encounterid = s.encounterid)