Search code examples
sqlmysqlaggregate-functions

SQL MaxSum query selecting incorrect value for a column


I have this SQL query (I'm using MySQL 8.0.35):

SELECT MAX(su) as value, activity_id, user_id, u.faculty_id AS faculty_id
    FROM (
        SELECT SUM(s.distance) as su, s.activity_id as activity_id, s.user_id as user_id, s.date
            FROM submission s
            WHERE s.week = 2 AND s.accepted = 1 AND s.season_id = 859
            GROUP BY s.date, s.user_id, s.activity_id
    ) as sums
INNER JOIN user u ON user_id = u.id
GROUP BY activity_id;

What it does is: calculate the MAX SUM of distance, and fetch the "other" related columns. In the result, there can be multiple rows (ideally one per activity).

However, it returns incorrect values for the "faculty_id" column.

I've tried putting the INNER JOIN on user table inside of the nested SELECT, using several different WHERE clauses, but nothing seems to return the correct result.

Partial user table:

Partial user table

Partial submission table: Partial submission table

With this data, I expect the result to be:

value activity_id user_id faculty_id
456 743 265 3375
456 744 265 3375

However, the result I'm getting is (notice the faculty_id):

value activity_id user_id faculty_id
456 743 265 3373
456 744 265 3375

Solution

  • As per my link in the comments above, abusing GROUP BY in this way returns arbitrary and non-deterministic results for the columns you neither aggregate nor group by.

    Instead, the normal approach to "greatest-n-per-group" is to use ROW_NUMBER()...

    WITH
      sub AS
    (
      SELECT
        SUM(s.distance) as su,
        s.activity_id as activity_id,
        s.user_id as user_id,
        s.date
      FROM
        submission s
      WHERE
            s.week = 2 
        AND s.accepted = 1
        AND s.season_id = 859
      GROUP BY
        s.date,
        s.user_id,
        s.activity_id
    ),
      sorted AS
    (
      SELECT
        *,
        ROW_NUMBER() OVER (
          PARTITION BY activity_id
              ORDER BY su DESC
        )
          AS su_rank
      FROM
        sub
    )
    SELECT
      *
    FROM
      sorted AS s
    INNER JOIN
      user   AS u
        ON u.id = s.user_id
    WHERE
      s.su_rank = 1