Search code examples
sqlsql-servert-sqlwindow-functions

Classifying recordsets based on counts


I have students that are either non-matriculated or matriculated in another school but are enrolled in one or more or our classes in one or more subjects. The view v_Enrollments returns a complete list of students (NetId) and the subject (Subj) of the class they are taking.

For accounting reasons, I need to assign each student to one and only one department (based on class subject). If they are taking multiple classes across different subjects, priority is based on subject count. If they are taking an equal number of classes in different subjects, then the tie is broken (pseudo)randomly.

I am not fluent with windowing functions and struggled a bit to come up with a solution. The below query works but I can't help but wonder if there is a better solution.

select NetId, Subj
from (
    select NetId, Subj, RowNum
        , Max(RowNum) Over (partition by NetId) as MaxRowNum
    from (
        select NetId, Subj
            , Count(*) as Cnt
            , MAX(Count(*)) Over(partition by NetId) as MaxCnt
            , ROW_NUMBER() Over (partition by NetId order by checksum(NetId, Subj)) as RowNum
                                                    -- psuedorandom but repeatable ordering
        from v_Enrollment
        group by NetId, Subj
    ) as s2
    where Cnt = MaxCnt
) as s1
where RowNum = MaxRowNum
order by NetId

Solution

  • I guess there is no need of so many operations. It can be done wit something like this:

    WITH DataSource AS
    (
        SELECT NetId
              ,Subj
              ,COUNT(*) AS CntClasses
        FROM v_Enrollment
        GROUP BY NetId
                ,Subj
    ), 
    DataSourceOrdered AS
    (
        SELECT NetId
              ,Subj
              ,ROW_NUMBER() OVER (ORDER BY CntClasses DESC, checksum(NetId, Subj)) AS [RowID]
        FROM DataSource
    )
    SELECT *
    FROM DataSourceOrdered
    WHERE [RowID] = 1
    ORDER BY NetId;
    

    We need to know the count of classes for each student-subject pair. Then using one ROW_NUMBER to order to subjects by your conditions - higher classes count and then your random number. After this is done, just get the rows with [RowID] = 1.