sqlsql-serversql-server-2008

Keep newest value by duplicate dates


I have a table called 'staff':

Account   Score   UpdateTime                UpdateTime_order               
K1897     A       2023-09-08 14:57:58.113   1
K1897     B       2023-09-08 14:57:57.896   2
K1897     B       2023-08-01 10:07:57.487   3
K1897     B       2023-06-28 07:23:57.696   4
K1897     B       2023-06-05 14:20:13.789   5
K1898     C       2023-06-04 14:20:13.789   1

Every staff can only have one score per day, so Account K1897's score should be A at 2023-09-08. (The score changed from B to A that day)

To solve this problem, I decided to convert datetime to date format, then kept the newest UpdateTime when they are duplicate.

For example, I deselected the row when UpdateTime_order=2 from Account K1897 because its original UpdateTime 2023-09-08 14:57:57.896 < 2023-09-08 14:57:58.113

Account   Score   UpdateTime   UpdateTime_order               
K1897     A       2023-09-08   1      
K1897     B       2023-08-01   3
K1897     B       2023-06-28   4
K1897     B       2023-06-05   5
K1898     C       2023-06-04   1

Then reordered the UpdateTime_order based on new result.

What I expected:

Account   Score   UpdateTime   UpdateTime_order             
K1897     A       2023-09-08   1
K1897     B       2023-08-01   2
K1897     B       2023-06-28   3
K1897     B       2023-06-05   4
K1898     C       2023-06-04   1

My code:

;WITH CTE_staff AS (
select 
Account, 
Score, 
CAST([UpdateTime] AS Date) UpdateDate,
UpdateTime, 
UpdateTime_order
FROM staff
)
select 
Account, 
Score, 
UpdateDate,
ROW_NUMBER()OVER(PARTITION BY Account ORDER BY UpdateTime DESC) as UpdateTime_order3
from(
select *,
ROW_NUMBER()OVER(PARTITION BY Account, UpdateDate ORDER BY UpdateTime DESC) as UpdateTime_order2 
from CTE_staff
) jj
where jj.UpdateTime_order2=1

It ran succesfully, but I think I wrote it in a complicated way by creating new columns. Wandering if there's an easy way to do this?

fiddle: https://dbfiddle.uk/dJ1qw3Lt


Solution

  • Here's how to accomplish it using one ROW_NUMBER() to get the most recently updated row per account and day :

    SELECT Account, Score, UpdateTime, UpdateTime_order
    FROM (
      SELECT *, ROW_NUMBER() OVER (PARTITION BY Account, CAST(UpdateTime AS Date) ORDER BY UpdateTime DESC) AS RN
      FROM staff
    ) AS S
    WHERE rn = 1
    ORDER BY UpdateTime DESC
    

    It is also possible to use GROUP BY and the aggregate function MAX(): We need first determine the maximum update time per account and day:

    SELECT Account, MAX(UpdateTime) AS MAX_UpdateTime
    FROM staff
    GROUP BY Account, CAST(UpdateTime AS DATE)
    

    Then Join this dataset with the table as follow :

    SELECT s.*
    FROM staff s
    INNER JOIN (
      SELECT Account, MAX(UpdateTime) AS MAX_UpdateTime
      FROM staff
      GROUP BY Account, CAST(UpdateTime AS DATE)
    ) AS t ON S.Account = t.Account AND s.UpdateTime = t.MAX_UpdateTime
    ORDER BY s.UpdateTime DESC
    

    Demo here