Search code examples
sqlsql-serverwindow-functions

RANK() OVER PARTITION with RANK resetting


How can I get a RANK that restarts at partition change? I have this table:

ID    Date        Value  
1     2015-01-01  1  
2     2015-01-02  1 <redundant  
3     2015-01-03  2  
4     2015-01-05  2 <redundant  
5     2015-01-06  1  
6     2015-01-08  1 <redundant  
7     2015-01-09  1 <redundant  
8     2015-01-10  2  
9     2015-01-11  3  
10    2015-01-12  3 <redundant  

and I'm trying to delete all the rows where the Value is not changed from the previous entry (marked with < redundant). I've tried using cursors but it takes too long, as the table has ~50 million rows.

I've also tried using RANK:

SELECT ID, Date, Value,
RANK() over(partition by Value order by Date ASC) Rank,
FROM DataLogging 
ORDER BY Date ASC 

but I get:

ID    Date        Value  Rank   (Rank)
1     2015-01-01  1      1      (1)
2     2015-01-02  1      2      (2)
3     2015-01-03  2      1      (1)
4     2015-01-05  2      2      (2)
5     2015-01-06  1      3      (1)
6     2015-01-08  1      4      (2)
7     2015-01-09  1      5      (3)
8     2015-01-10  2      3      (1)
9     2015-01-11  3      1      (1)
10    2015-01-12  3      2      (2)

in parantheses is the Rank I would want, so that I can filter out rows with Rank = 1 and delete the rest of the rows.

EDIT: I've accepted the answer that seemed the easiest to write, but unfortunately none of the answers runs fast enough for deleting the rows. In the end I've decided to use the CURSOR afterall. I've split the data in chuncks of about 250k rows and the cursor runs through and deletes the rows in ~11 mins per batch of 250k rows, and the answers below, with DELETE, take ~35 mins per batch of 250k rows.


Solution

  • select * 
    from  ( select ID, Date, Value, lag(Value, 1, 0) over (order by ID) as ValueLag 
            from table ) tt
    where ValueLag is null or ValueLag <> Value  
    

    if the order is Date then over (order by Date)

    this should show you good and bad - it is based on ID - it you need date then revise
    it may look like a long way around but it should be pretty efficient

    declare @tt table  (id tinyint, val tinyint);
    insert into @tt values 
    ( 1, 1),
    ( 2, 1),
    ( 3, 2),
    ( 4, 2),
    ( 5, 1),
    ( 6, 1),
    ( 7, 1),
    ( 8, 2),
    ( 9, 3),
    (10, 3);
    
    select id, val, LAG(val) over (order by id) as lagVal
    from @tt;
    
    -- find the good
    select id, val 
    from ( select id, val, LAG(val) over (order by id) as lagVal
           from @tt 
         ) tt
    where  lagVal is null or lagVal <> val 
    
    -- select the bad 
    select tt.id, tt.val 
      from @tt tt
      left join ( select id, val 
                    from ( select id, val, LAG(val) over (order by id) as lagVal
                             from @tt 
                         ) ttt
                   where   ttt.lagVal is null or ttt.lagVal <> ttt.val 
                ) tttt 
        on tttt.id = tt.id 
     where tttt.id is null