mysql join group-by aggregate-functions greatest-n-per-group

Get row of data for MAX value over GROUP BY and JOINS

What I've done so far

Since the count wasn't explicitly defined, I wrote a query to find it.

   SELECT age, movieid, COUNT(*) AS mcount
   FROM ratings
   JOIN users ON id = userid
   GROUP BY age, movieid

This gave me the count for every movie for every age.

To find the maximum for each age

    SELECT age, MAX(mcount) AS mc
    FROM (
        SELECT age, movieid, COUNT(*) AS mcount
        FROM ratings
        JOIN users ON id = userid
        GROUP BY age, movieid
    ) t1
    GROUP BY age

This gave me the age and the max count. However, I also want the corresponding movie ID and this where I have been stuck. My thought process is to join these results with the first table but it is not working. Are there any other alternatives I can try? This is the query I used.


SELECT users.age, ratings.movieid, count(*) as mc2
FROM ratings JOIN users ON id = userid 
INNER JOIN
(
    SELECT age, MAX(mcount) AS mc
    FROM (
        SELECT age, movieid, COUNT(*) AS mcount
        FROM ratings
        JOIN users ON id = userid
        GROUP BY age, movieid
    ) t1
    GROUP BY age
)t2
ON t2.age = users.age
WHERE mc2=t2.mc
GROUP BY users.age, ratings.movieid;

Solution

You can do it like this:

SELECT t.age, t.movieid, t.mcount
FROM (
   SELECT age, movieid, COUNT(*) AS mcount
   FROM ratings
   JOIN users ON id = userid
   GROUP BY age, movieid
) t
LEFT JOIN (
   SELECT age, movieid, COUNT(*) AS mcount
   FROM ratings
   JOIN users ON id = userid
   GROUP BY age, movieid
) t2
ON t.age = t2.age AND
   t.movieid <> t2.movieid AND
   t.mcount < t2.mcount
WHERE t2.age IS NULL

Explanation:

first subselect
- joins ratings and users
- on the id of users
- groups by age and movieid
- to find the groups
- selects the fields you need
- yields all groups
second subselect
- is the same as the first
we LEFT JOIN the two groups by
- matching age
- different movieid (so we will have sensible count comparisons)
- and the first mcount being smaller than the second
- because we want to find items of the first group
- which do not have higher count matches
- so we search for higher count matches
and in the WHERE clause we exclude the scenario when we found higher count matches in the second group than in the first group

Fiddle: http://sqlfiddle.com/#!9/f8defe/6