Search code examples
mysqlsqldateinner-joindense-rank

How to get the last n day score change for each user along with rank for multiple columns using MYSQL?


I have a MYSQL database that keeps track of all the users' daily total scores (and some other similar score/count type metrics like "badgesEarned", I am only including 2 fields here out of the 5 I need to track). It only has data for the days in which a user was active (earning score points or badges). So the db wont have data for every date there is.

Here's a toy example: Example Database Table: "User"

Now my goal is to get the last 7 days change in score for each user (I also need to do last 30 days and 365 day but let's stick to just 7 for this example). Since the db table stores a snapshot of total scores for all active days for each user, I wrote a SQL query that finds the two appropriate rows/snapshots and gets the difference in score/badges between them. These 2 rows would be the current date row (or if that doesnt exist, use the row just prior to it) vs the (current_date - 7)th row (or if that doesnt exist, use the row just prior to it).

To make matters worse, I also have to keep track of the "ranks" of each player via the dense_rank() SQL method and add that in as a column in the final result table.

There are 2 ways so far that I can achieve this using 2 different SQL queries.

My main question is - is one of these "better" in terms of performance/good practice/efficiency than the other? Or are they both horrendous and I have completely gone down the wrong route to begin with and totally missed a more efficient approach? I am not great with SQL stuff, so apologies in advance if the question and code examples are horrifying:

First Approach: Use multiple nested subqueries only (no join).

SELECT *, dense_rank() OVER (ORDER BY t3.score DESC) AS ranking
FROM
(
  SELECT t1.userId, 
                
                                (SELECT t2.score
                                FROM User t2 
                                WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
                                ORDER BY t2.date DESC LIMIT 1)
                                - 
                                (SELECT t2.score
                                FROM User t2 
                                WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
                                ORDER BY t2.date DESC LIMIT 1) as score,
                
                                (SELECT t2.badgesEarned
                                FROM User t2 
                                WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
                                ORDER BY t2.date DESC LIMIT 1)
                                - 
                                (SELECT t2.badgesEarned
                                FROM User t2 
                                WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
                                ORDER BY t2.date DESC LIMIT 1) as badgesEarned
                
    FROM User t1
    GROUP BY t1.userId) t3

Second Approach: Get 2 separate tables for each date point, then do Inner Join to subtract relevant columns.

SELECT *, dense_rank() OVER (ORDER BY T0.score_delta DESC) AS ranking
FROM
(SELECT T1.userId,
        (T1.score - T2.score),
        (T1.badgesEarned - T2.badgesEarned)

FROM 

(select *
from (
   select *, row_number() over (partition by userId order by date desc) as ranking
   from User
   where date<=date_add(CURDATE(),interval -7 day)
) t
where t.ranking = 1) as T2

INNER JOIN

(select *
from (
   select *, row_number() over (partition by userId order by date desc) as ranking
   from User
   where date<=CURDATE()
) t
where t.ranking = 1) as T1

on T1.userId= T2.userId ) T0

Side-question: One of my colleagues was suggesting that I handle the column subtractions in the code itself - like, I would call the database twice, get the two tables (one for CURDATE() and another for CURDATE-7), and then loop through all the User objects and subtract the relevant fields to construct my final result list. I'm not sure if that would be the better approach, so should I be doing that instead of handling it all through the SQL way?

Here's the SQLfiddle of the db if you want to play around with dummy data: http://sqlfiddle.com/#!9/86c58f0/1

Also, the above two code segments run just fine on my MySQL 8.0 workbench with no errors.


Solution

  • I'm not quite getting your expected results. But could you not just work with window functions, in conjunction with the RANGE clause?

    I'm just creating the central backbone table, and it will then be up to you to subtract whatever you need to subtract from each other, and finally to dense_rank() what you need to dense_rank(). Basically, I think you need to put a final select, containing DENSE_RANK() , to select from my with_a_week_before in-line table.

    WITH                                                                                                 
    -- your input
    usr(userid,dt,score,badgesearned) AS (
              SELECT 1234,DATE '2020-08-06', 100, 10
    UNION ALL SELECT 1234,DATE '2020-08-07', 120, 12
    UNION ALL SELECT 1234,DATE '2020-08-08', 130, 13
    UNION ALL SELECT 1234,DATE '2020-08-12', 140, 14
    UNION ALL SELECT 1234,DATE '2020-08-14', 150, 15
    UNION ALL SELECT  100,DATE '2020-08-05', 100, 10
    UNION ALL SELECT  100,DATE '2020-08-10', 100, 10
    UNION ALL SELECT  100,DATE '2020-08-14', 200, 10
    UNION ALL SELECT    1,DATE '2020-08-05', 140, 14
    UNION ALL SELECT    1,DATE '2020-08-08', 145, 14
    UNION ALL SELECT    1,DATE '2020-08-12', 150, 15
    )
    ,
    with_a_week_before AS (
      SELECT 
        *
      , FIRST_VALUE(score) OVER(
          PARTITION BY userid ORDER BY dt
          RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
        ) AS score_a_week
      , FIRST_VALUE(badgesearned) OVER(
          PARTITION BY userid ORDER BY dt
          RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
        ) AS badgesearned_a_week
      , FIRST_VALUE(dt) OVER( -- check the date of the previous row
          PARTITION BY userid ORDER BY dt
          RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
        ) AS dt_a_week
      FROM usr
    )
    SELECT * FROM with_a_week_before ORDER BY userid
    -- out  userid |     dt     | score | badgesearned | score_a_week | badgesearned_a_week | dt_a_week  
    -- out --------+------------+-------+--------------+--------------+---------------------+------------
    -- out       1 | 2020-08-05 |   140 |           14 |          140 |                  14 | 2020-08-05
    -- out       1 | 2020-08-08 |   145 |           14 |          140 |                  14 | 2020-08-05
    -- out       1 | 2020-08-12 |   150 |           15 |          140 |                  14 | 2020-08-05
    -- out     100 | 2020-08-05 |   100 |           10 |          100 |                  10 | 2020-08-05
    -- out     100 | 2020-08-10 |   100 |           10 |          100 |                  10 | 2020-08-05
    -- out     100 | 2020-08-14 |   200 |           10 |          100 |                  10 | 2020-08-10
    -- out    1234 | 2020-08-06 |   100 |           10 |          100 |                  10 | 2020-08-06
    -- out    1234 | 2020-08-07 |   120 |           12 |          100 |                  10 | 2020-08-06
    -- out    1234 | 2020-08-08 |   130 |           13 |          100 |                  10 | 2020-08-06
    -- out    1234 | 2020-08-12 |   140 |           14 |          100 |                  10 | 2020-08-06
    -- out    1234 | 2020-08-14 |   150 |           15 |          120 |                  12 | 2020-08-07