How can I get a RANK that restarts at partition change? I have this table:
ID Date Value
1 2015-01-01 1
2 2015-01-02 1 <redundant
3 2015-01-03 2
4 2015-01-05 2 <redundant
5 2015-01-06 1
6 2015-01-08 1 <redundant
7 2015-01-09 1 <redundant
8 2015-01-10 2
9 2015-01-11 3
10 2015-01-12 3 <redundant
and I'm trying to delete all the rows where the Value is not changed from the previous entry (marked with < redundant). I've tried using cursors but it takes too long, as the table has ~50 million rows.
I've also tried using RANK:
SELECT ID, Date, Value,
RANK() over(partition by Value order by Date ASC) Rank,
FROM DataLogging
ORDER BY Date ASC
but I get:
ID Date Value Rank (Rank)
1 2015-01-01 1 1 (1)
2 2015-01-02 1 2 (2)
3 2015-01-03 2 1 (1)
4 2015-01-05 2 2 (2)
5 2015-01-06 1 3 (1)
6 2015-01-08 1 4 (2)
7 2015-01-09 1 5 (3)
8 2015-01-10 2 3 (1)
9 2015-01-11 3 1 (1)
10 2015-01-12 3 2 (2)
in parantheses is the Rank I would want, so that I can filter out rows with Rank = 1 and delete the rest of the rows.
EDIT: I've accepted the answer that seemed the easiest to write, but unfortunately none of the answers runs fast enough for deleting the rows. In the end I've decided to use the CURSOR afterall. I've split the data in chuncks of about 250k rows and the cursor runs through and deletes the rows in ~11 mins per batch of 250k rows, and the answers below, with DELETE, take ~35 mins per batch of 250k rows.
select *
from ( select ID, Date, Value, lag(Value, 1, 0) over (order by ID) as ValueLag
from table ) tt
where ValueLag is null or ValueLag <> Value
if the order is Date then over (order by Date)
this should show you good and bad - it is based on ID - it you need date then revise
it may look like a long way around but it should be pretty efficient
declare @tt table (id tinyint, val tinyint);
insert into @tt values
( 1, 1),
( 2, 1),
( 3, 2),
( 4, 2),
( 5, 1),
( 6, 1),
( 7, 1),
( 8, 2),
( 9, 3),
(10, 3);
select id, val, LAG(val) over (order by id) as lagVal
from @tt;
-- find the good
select id, val
from ( select id, val, LAG(val) over (order by id) as lagVal
from @tt
) tt
where lagVal is null or lagVal <> val
-- select the bad
select tt.id, tt.val
from @tt tt
left join ( select id, val
from ( select id, val, LAG(val) over (order by id) as lagVal
from @tt
) ttt
where ttt.lagVal is null or ttt.lagVal <> ttt.val
) tttt
on tttt.id = tt.id
where tttt.id is null