I have this SQL query (I'm using MySQL 8.0.35):
SELECT MAX(su) as value, activity_id, user_id, u.faculty_id AS faculty_id
FROM (
SELECT SUM(s.distance) as su, s.activity_id as activity_id, s.user_id as user_id, s.date
FROM submission s
WHERE s.week = 2 AND s.accepted = 1 AND s.season_id = 859
GROUP BY s.date, s.user_id, s.activity_id
) as sums
INNER JOIN user u ON user_id = u.id
GROUP BY activity_id;
What it does is: calculate the MAX SUM of distance, and fetch the "other" related columns. In the result, there can be multiple rows (ideally one per activity).
However, it returns incorrect values for the "faculty_id" column.
I've tried putting the INNER JOIN on user table inside of the nested SELECT, using several different WHERE clauses, but nothing seems to return the correct result.
Partial user table:
With this data, I expect the result to be:
value | activity_id | user_id | faculty_id |
---|---|---|---|
456 | 743 | 265 | 3375 |
456 | 744 | 265 | 3375 |
However, the result I'm getting is (notice the faculty_id):
value | activity_id | user_id | faculty_id |
---|---|---|---|
456 | 743 | 265 | 3373 |
456 | 744 | 265 | 3375 |
As per my link in the comments above, abusing GROUP BY in this way returns arbitrary and non-deterministic results for the columns you neither aggregate nor group by.
Instead, the normal approach to "greatest-n-per-group" is to use ROW_NUMBER()...
WITH
sub AS
(
SELECT
SUM(s.distance) as su,
s.activity_id as activity_id,
s.user_id as user_id,
s.date
FROM
submission s
WHERE
s.week = 2
AND s.accepted = 1
AND s.season_id = 859
GROUP BY
s.date,
s.user_id,
s.activity_id
),
sorted AS
(
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY activity_id
ORDER BY su DESC
)
AS su_rank
FROM
sub
)
SELECT
*
FROM
sorted AS s
INNER JOIN
user AS u
ON u.id = s.user_id
WHERE
s.su_rank = 1