Search code examples
mysqljoingroup-bysubquerygreatest-n-per-group

MySQL: GROUP BY + HAVING MAX(...) --- Why HAVING MAX(grade) will not return maximum grade?


Credit Leetcode 1112. Highest Grade For Each Student

Requirement: Write a SQL query to find the highest grade with its corresponding course for each student. In case of a tie, you should find the course with the smallest course_id. The output must be sorted by increasing student_id.

The query result format is in the following example:

Enrollments table:
+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 2          | 2         | 95    |
| 2          | 3         | 95    |
| 1          | 1         | 90    |
| 1          | 2         | 99    |
| 3          | 1         | 80    |
| 3          | 2         | 75    |
| 3          | 3         | 82    |
+------------+-----------+-------+

Result table:
+------------+-------------------+
| student_id | course_id | grade |
+------------+-----------+-------+
| 1          | 2         | 99    |
| 2          | 2         | 95    |
| 3          | 3         | 82    |

Why this cannot work?

select student_id, course_id, grade
from enrollments
group by student_id
having max(grade)
order by student_id

I thought the return should be {"headers": ["student_id", "course_id", "grade"], "values": [[1, 2, 99], [2, 2, 95], [3, 3, 82]]}; however, the actual return is {"headers": ["student_id", "course_id", "grade"], "values": [[1, 1, 90], [2, 2, 95], [3, 1, 80]]}.

Thank you so much if anyone can help me!


Solution

  • Maybe you think that this condition:

    having max(grade)
    

    is an instruction so that only the rows that have the max grade for each studentid should be returned.
    This is not what a HAVING clause does.
    It is a way to filter aggregated data after aggregation is done when it is used after a GROUP BY clause.
    The HAVING clause accepts 1 or more boolean expressions which evaluate to TRUE or FALSE.
    So in this case max(grade) is not a boolean expression (although for MySql any numeric expression can be used in the place of a boolean one).

    I understand that you want in the results the rows with the max grade of each studentid.
    This can be done in the most efficient and performant way with window functions in MySQL 8.0: ROW_NUMBER() or RANK() if you want ties returned also:

    select e.student_id, e.course_id, e.grade
    from (
      select *, row_number() over (partition by student_id order by grade desc) rn
      from Enrollments
    ) e
    where e.rn = 1
    

    See the demo.
    Results:

    | student_id | course_id | grade |
    | ---------- | --------- | ----- |
    | 1          | 2         | 99    |
    | 2          | 2         | 95    |
    | 3          | 3         | 82    |