Search code examples
phpmysqlsqlranking

How to increment rank in mysql only when the 2 columns (duration and score) are different?


DETAILS

I've combined the following tables

testresults

--------------------------------------------------------------------
| index | uid|         start         |          stop        | score| 
--------------------------------------------------------------------
|   1   | 23 |   2012-06-06 07:30:20 | 2012-06-06 07:30:34  | 100  |
--------------------------------------------------------------------
|   2   | 34 |   2012-06-06 07:30:21 | 2012-06-06 07:30:40  | 100  |
--------------------------------------------------------------------

usertable

------------------------------
| id  |       username       |  
------------------------------
| 23  |    MacGyver’s mum    | 
------------------------------
| 34  |       Gribblet       | 
------------------------------

using this sql

SELECT a.username, b.duration, b.score
FROM usertable AS a
JOIN    (SELECT `uid`, `score`,
TIMESTAMPDIFF( SECOND, start, stop ) AS `duration`
FROM `testresults`
WHERE `start` >= DATE(NOW())
ORDER BY `score` DESC, `duration` ASC
LIMIT 100) AS b
ON a.id = b.uid

Problem is I want to rank the results. I think it is probably easier/faster to do it in sql as opposed to php, so based on http://code.openark.org/blog/mysql/sql-ranking-without-self-join this is what I tried

SELECT a.username, b.duration, b.score, COUNT(DISTINCT b.duration, b.score) AS rank
FROM usertable AS a
JOIN    (SELECT `uid`, `score`,
TIMESTAMPDIFF( SECOND, start, stop ) AS `duration`
FROM `testresults`
WHERE `start` >= DATE(NOW())
ORDER BY `score` DESC, `duration` ASC
LIMIT 100) AS b
ON a.id = b.uid

but I don't get back the expected ranks. It only returns one row.

QUESTION

What am I doing wrong? How can I increase rank only when duration and score are unique?

UPDATE1

Using bdenham's "slow method" worked for me, but the second method didn't. I don't really understand what is going on in the "fast method". I've posted the data I was using and the resulting table. You'll see that the ranking is messed up.

 -------------------------------------------------------------------
| index | uid|         start         |          stop        | score| 
--------------------------------------------------------------------
|   1   | 32 |  2012-08-27 05:47:18  |  2012-08-27 05:47:36 |  100 | 18s
|   2   | 32 |  2012-08-27 05:50:36  |  2012-08-27 05:50:42 |   0  |  6s
|   3   | 32 |  2012-08-27 05:51:18  |  2012-08-27 05:51:25 |  100 |  7s
|   4   | 32 |  2012-08-27 05:51:30  |  2012-08-27 05:51:35 |   0  |  5s
|   5   | 32 |  2012-08-27 05:51:39  |  2012-08-27 05:51:44 |   50 |  5s
--------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| username | score | duration | @prevScore:=@currScore | @prevDuration:=@currDuration | @currScore:=r.score | @currDuration:=timestampdiff(second,r.start,r.stop) |rank |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   bob    |  100  |    7     |     [BLOB - 1B]        |         [BLOB - 1B]          |     100             |                                7                    |  3  |
|   bob    |  100  |    18    |     [BLOB - 0B]        |         [BLOB - 0B]          |     100             |                               18                    |  1  |
|   bob    |   50  |    5     |     [BLOB - 1B]        |         [BLOB - 1B]          |      50             |                                5                    |  5  |
|   bob    |   0   |    5     |     [BLOB - 3B]        |         [BLOB - 1B]          |       0             |                                5                    |  4  |
|   bob    |   0   |    6     |     [BLOB - 3B]        |         [BLOB - 2B]          |       0             |                                6                    |  2  |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Solution

  • Both methods from the link in your question work with MySQL 5.5.25. Here is the SQL Fiddle. But I am not able to adapt the methods to your slightly more complicated model. You have an additional join, plus your rank is based on two columns instead of just one.

    Your attempt does not follow either method, though I suspect you were attempting to follow the slow "traditional" solution. As others have pointed out, that solution requires a self join and group by that you are completely lacking.

    Here is my broken attempt at adapting the slow method to your model. The problem is MySQL only preserves the username of the last row found for a given rank. Earlier rows with the same rank are discarded from the results. The query would not run on most databases because the GROUP BY does not include username. MySQL has non-standard rules for GROUP BY. I don't understand why your moderately complicated model doesn't work, but the simple linked model does work. I think it is a bad idea to have missing GROUP BY terms anyway.

    select u.username,
           r1.score,
           timestampdiff(second,r1.start,r1.stop) duration,
           count( distinct concat(r2.score,',',timestampdiff(second,r2.start,r2.stop)) ) rank
      from testresults r1
      join testresults r2
        on r2.score>r1.score
         or( r2.score=r1.score
             and
             timestampdiff(second,r2.start,r2.stop)<=timestampdiff(second,r1.start,r1.stop)
           )
      join usertable u
        on u.id=r1.uid
     where r1.start>=date(now())
       and r2.start>=date(now())
     group by r1.score, duration
     order by score desc, duration asc limit 100
    

    Here is a fix for the slow method. It first computes the rank for each unique score/duration pair, then joins that result with each test result. This works, but it is even slower than the original broken method.

    select username,
           r.score,
           r.duration,
           r.rank
      from testresults tr
      join usertable u
        on u.id=tr.uid
      join (
              select r1.score,
                     timestampdiff(second,r1.start,r1.stop) duration,
                     count( distinct concat(r2.score,',',timestampdiff(second,r2.start,r2.stop)) ) rank
                from testresults r1
                join testresults r2
                  on r2.score>r1.score
                   or( r2.score=r1.score
                       and
                       timestampdiff(second,r2.start,r2.stop)<=timestampdiff(second,r1.start,r1.stop)
                     )
               where r1.start>=date(now())
                 and r2.start>=date(now())
               group by r1.score, duration
           ) r
        on r.score=tr.score
       and r.duration=timestampdiff(second,tr.start,tr.stop)
     where tr.start>=date(now())
     order by rank limit 100
    

    Here is my broken attempt at adapting the fast method to your model. The method does not work because the selected variables are computed prior to the sort operation. Again I don't understand why the simple model in the link works but your model does not.

    select u.username,
           r.score,
           timestampdiff(second,r.start,r.stop) duration,
           @prevScore:=@currScore,
           @prevDuration:=@currDuration,
           @currScore:=r.score,
           @currDuration:=timestampdiff(second,r.start,r.stop),
           @rank:=if(@prevScore=@currScore and @prevDuration=@currDuration, @rank, @rank+1) rank
      from testresults r
      join usertable u
        on u.id=r.uid
      cross join (select @currScore:=null, @currDuration:=null, @prevScore:=null, @prevDuration:=null, @rank:=0) init
     where r.start>=date(now())
     order by score desc, duration asc limit 100
    

    Here is a "fixed" version of the fast method. But it relies on the order of the sorted rows in the subquery. In general a query should never rely on the order of rows unless there is an explicit SORT operation. The outer query is not sorted, and even if it were, I don't know if the variables would be computed before or after the outer sort.

    select username,
           score,
           duration,
           @prevScore:=@currScore,
           @prevDuration:=@currDuration,
           @currScoure:=score,
           @currDuration:=duration,
           @rank:=if(@prevScore=score and @prevDuration=duration, @rank, @rank+1) rank
      from (
              select u.username,
                     r.score,
                     timestampdiff(second,r.start,r.stop) duration
                from testresults r
                join usertable u
                  on u.id=r.uid
               where r.start>=date(now())
               order by score desc, duration asc limit 100
           ) scores,
           (
              select @currScore:=null, 
                     @currDuration:=null, 
                     @rank:=0
           ) init
    

    I think you will get just as good performance if you simply select the results without rank, ordered by score and duration. Your PHP can efficiently compute the rank since the results are already sorted. Your PHP can initialize rank to 0 and prev score and duration to null. Then compare each row with the previous values and increment the rank if there is a difference. The big advantage of letting PHP rank the sorted results is it should always work, regardless of the brand or version of the database engine. And it should still be fast.

    Here is the SQL Fiddle showing all 4 queries. I modified the WHERE clauses so that the queries continue to work for any date.