Search code examples
sqlsql-serversql-deleterow-number

Delete or change records in ETL 2


I would like to follow up with a question from my earlier POST: Delete or change records in ETL

The problem mentioned there was solved with the following:

 ; with todelete as (
      select *, 
           count(*) over (partition by label) as cnt, 
           lag(cost) over (partition by label order by time ASC) as lastcost
           ROW_NUMBER() over (partition by label order by time ASC) as r_number
      from Table1
     )
delete from todelete 
    where cnt > 1 and r_number between 1 and (cnt/2)*2 and  cost=ISNULL(lastcost,cost)

, during testing, however, I came across one problem when these movements occur in the table (they cannot be prevented).:

label   cost   time
x2       29    14/5/2020 01:00:00
x3       20    14/5/2020 01:02:00
x2       29    15/5/2020 03:12:02

That for the same 'Label' I have two identical lines (except timestamp) with the same 'cost'. And just the above solution will delete these two records. Which I need only the older one to be deleted.

Thx for the advice

UPDATE:

MY goal is.

i have table with records:

label   cost   time
x2       29    14/5/2020 01:00:00
x3       20    14/5/2020 01:02:00
x2       30    15/5/2020 03:12:02

Now i have delete function:

; with todelete as (
          select *, 
               count(*) over (partition by label) as cnt, 
               lag(cost) over (partition by label order by time ASC) as lastcost
               ROW_NUMBER() over (partition by label order by time ASC) as r_number
          from Table1
         )
    delete from todelete 
        where cnt > 1 and r_number between 1 and (cnt/2)*2 and  cost=ISNULL(lastcost,cost)

With which I get mine they wanted a table:

label   cost   time
x3       20    14/5/2020 01:02:00
x2       30    15/5/2020 03:12:02

But the problem occurs when the original table looks like:

label   cost   time
x2       29    14/5/2020 01:00:00
x3       20    14/5/2020 01:02:00
x2       29    15/5/2020 03:12:02

Now Delete function(mentioned above)

I'll get a table:

 label     cost   time
    x3       20    14/5/2020 01:02:00

With the above delete function, both records for the label "X2" will be deleted, but I only want to delete the older one.


Solution

  • Nobody anything?

    I try this: I couldn't solve it. Here in this you see that it will delete both records for me (I just want the older one): https://rextester.com/TLLQ93275

    In this case, it works correctly, but if "x2" has the same price (for example, 29), it will also delete both entries. https://rextester.com/RHB70490

    UPDATE:

    I finally managed to solve the problem. I added another ranking function and conditioned it appropriately.

    ; with todelete as (
          select *, 
               count(*) over (partition by label) as cnt, 
         lag(cost) over (partition by label order by time ASC) as lastcost
         ,ROW_NUMBER() over (partition by label order by time ASC) as r_number
          ,ROW_NUMBER() over (partition by label order by time DESC) as r_number2
        ,RANK() over (partition by cost order by time asc) as TEST
        ,Row_NUMBER() over (partition by label order by TIME DESC) as TEST2
           from Table1
         )
    DELETE from todelete 
        where (cnt > 1 and r_number between 1 and (cnt/2)*2 and cost=ISNULL(lastcost,cost) AND TEST2 !=1)  OR (cnt>1 AND TEST2<>1 AND r_number2 != 1)
    

    For illustration, here: https://rextester.com/DONME54328