Search code examples
sqlsql-serversubquerygreatest-n-per-groupsql-server-2017

Return specific rows, cascading results in column - If no 1 then multiple 2s, else if no 2s then 3, else if no 3, then multiple 4s


I am working with student data in MS SQL and have some very specific rules to follow.

Sample Table

CREATE TABLE students (
    encounterId INT,
    studentId INT,
    positionId INT
);

INSERT INTO students
VALUES
(100,20,1),
(100,32,2),
(100,14,2),
(101,18,1),
(101,87,2),
(101,78,3),
(102,67,2),
(102,20,2),
(103,33,3),
(103,78,4),
(104,16,1),
(104,18,4),
(105,67,4),
(105,18,4),
(105,20,4);

Table rules

The table shows student encounters where students are placed in a position between 1 and 4.
There can be multiple students in an encounter.
There can be only one student in position 1 in an encounter.
There can be only one student in position 3 in an encounter.
However, multiple students can be in positions 2 and 4 in an encounter.

Business rules

The business rule is as follows for each encounter:

  • If the encounter has a student in position 1, return that encounter's row (singular position 1), removing any positions 2-4 rows for that encounter
  • ELSE if no position 1 THEN return the encounter's rows for students (can be multiple) in position 2, removing any positions 3 or 4 for that encounter
  • ELSE if no positions 1-2 THEN return the encounter's row for students in position 3, removing any position 4 rows for that encounter
  • ELSE if no positions 1-3 THEN return the encounter's rows for students in position 4

Not quite working

Concatenation of the studentId value is acceptable, but not ideal. I've got this semi-working with a wonky series of unions and string_aggs. The rows with positionId=3 are problematic, as I've remarked in the code.

Additionally, this union/not like architecture works in my small development DB but will have serious performance issues in production DBs:

WITH tAll
AS (   SELECT
           encounterId,
           studentId,
           positionId
       FROM
           students)

SELECT
    encounterId,
    CAST(studentId AS VARCHAR) AS [studentId],
    1                          AS [ord]
FROM
    tAll
WHERE
    positionId = 1
UNION
SELECT
    encounterId,
    CAST(studentId AS VARCHAR),
    2 AS [ord]
FROM
    (
        SELECT
            encounterId,
            STRING_AGG(studentId, ',')  AS [studentId],
            STRING_AGG(positionId, ',') AS [positionId]
        FROM
            tAll
        GROUP BY
            encounterId
    ) t2
WHERE
    positionId NOT LIKE '%1%'
    AND positionId NOT LIKE '%3%'
    AND positionId NOT LIKE '%4%'
UNION
SELECT
    encounterId,
    CAST(studentId AS VARCHAR),
    3 AS [ord]
FROM
    --tAll WHERE positionId=3 
    --Limiting to positionId=3 includes results (101,18,1) AND (101,78,3).. I just want (101,18,1)
    --Using the below code instead, but this creates other problems
    (
        SELECT
            encounterId,
            STRING_AGG(studentId, ',')  AS [studentId],
            STRING_AGG(positionId, ',') AS [positionId]
        FROM
            tAll
        GROUP BY
            encounterId
    ) t3
WHERE
    positionId NOT LIKE '%1%'
    AND positionId NOT LIKE '%2%'
    AND positionId NOT LIKE '%4%'
--This excludes 103 entirely since it has both positionId values of 3 AND 4... I just want (103,33,3)
UNION
SELECT
    encounterId,
    CAST(studentId AS VARCHAR),
    4 AS [ord]
FROM
    (
        SELECT
            encounterId,
            STRING_AGG(studentId, ',')  AS [studentId],
            STRING_AGG(positionId, ',') AS [positionId]
        FROM
            tAll
        GROUP BY
            encounterId
    ) t4
WHERE
    positionId NOT LIKE '%1%'
    AND positionId NOT LIKE '%2%'
    AND positionId NOT LIKE '%3%';

What I want returned

encounterId studentId ord
100 20 1
101 18 1
102 67 2
102 20 2
103 33 3
104 16 1
105 67 4
105 18 4
105 20 4

Solution

  • This is a top-1-per-group problem... with ties.

    You can use window function to rank() in a subquery to rank students within each encounter, then filter on the top record(s) per group in the outer query:

    select *
    from (
        select s.*,
            rank() over(partition by encounterid order by positionid) rn
        from students s
    ) s
    where rn = 1
    order by encounterid 
    

    Another option uses with ties - but you can't control the ordering of the rows in the resultset:

    select top (1) with ties *
    from students s
    order by rank() over(partition by encounterid order by positionid)
    

    Another typical solution is to filter with a correlated subquery:

    select *
    from students s
    where positionid = (select min(s1.positionid) from students s1 where s1.encounterid  = s.encounterid)