Search code examples
mysqlsqlgreatest-n-per-group

Select sum of top three scores for each user


I am having trouble writing a query for the following problem. I have tried some existing queries but cannot get the results I need.

I have a results table like this:

userid  score timestamp
1       50    5000
1       100   5000 
1       400   5000 
1       500   5000 
2       100   5000 
3       1000  4000 

The expected output of the query is like this:

userid  score
3       1000
1       1000
2       100

I want to select a top list where I have n best scores summed for each user and if there is a draw the user with the lowest timestamp is highest. I really tried to look at all old posts but could not find one that helped me.

Here is what I have tried:

SELECT sum(score) FROM (
   SELECT score 
   FROM results 
   WHERE userid=1 ORDER BY score DESC LIMIT 3
) as subquery

This gives me the results for one user, but I would like to have one query that fetches all in order.


Solution

  • This is a pretty typical greatest-n-per-group problem. When I see those, I usually use a correlated subquery like this:

    SELECT *
    FROM myTable m
    WHERE(
      SELECT COUNT(*)
      FROM myTable mT
      WHERE mT.userId = m.userId AND mT.score >= m.score) <= 3;
    

    This is not the whole solution, as it only gives you the top three scores for each user in its own row. To get the total, you can use SUM() wrapped around that subquery like this:

    SELECT userId, SUM(score) AS totalScore
    FROM(
      SELECT userId, score
      FROM myTable m
      WHERE(
        SELECT COUNT(*)
        FROM myTable mT
        WHERE mT.userId = m.userId AND mT.score >= m.score) <= 3) tmp
    GROUP BY userId;
    

    Here is an SQL Fiddle example.

    EDIT

    Regarding the ordering (which I forgot the first time through), you can just order by totalScore in descending order, and then by MIN(timestamp) in ascending order so that users with the lowest timestamp appears first in the list. Here is the updated query:

    SELECT userId, SUM(score) AS totalScore
    FROM(
      SELECT userId, score, timeCol
      FROM myTable m
      WHERE(
        SELECT COUNT(*)
        FROM myTable mT
        WHERE mT.userId = m.userId AND mT.score >= m.score) <= 3) tmp
    GROUP BY userId
    ORDER BY totalScore DESC, MIN(timeCol) ASC;
    

    and here is an updated Fiddle link.

    EDIT 2

    As JPW pointed out in the comments, this query will not work if the user has the same score for multiple questions. To settle this, you can add an additional condition inside the subquery to order the users three rows by timestamp as well, like this:

    SELECT userId, SUM(score) AS totalScore
    FROM(
      SELECT userId, score, timeCol
      FROM myTable m
      WHERE(
        SELECT COUNT(*)
        FROM myTable mT
        WHERE mT.userId = m.userId AND mT.score >= m.score 
          AND mT.timeCol <= m.timeCol) <= 3) tmp
    GROUP BY userId
    ORDER BY totalScore DESC, MIN(timeCol) ASC;
    

    I am still working on a solution to find out how to handle the scenario where the userid, score, and timestamp are all the same. In that case, you will have to find another tiebreaker. Perhaps you have a primary key column, and you can choose to take a higher/lower primary key?