Search code examples
sqlsql-servergaps-and-islands

How can I delete trailing contiguous records in a partition with a particular value?


I'm using the latest version of SQL Server and have the following problem. Given the table below, the requirement, quite simply, is to delete "trailing" records in each _category partition that have _value = 0. Trailing in this context means, when the records are placed in _date order, any series or contiguous block of records with _value = 0 at the end of the list should be deleted. Records with _value = 0 that have subsequent records in the partition with some non-zero value should stay.

create table #x (_id int identity, _category int, _date date, _value int)

insert into #x values (1, '2022-10-01', 12)
insert into #x values (1, '2022-10-03', 0)
insert into #x values (1, '2022-10-04', 10)
insert into #x values (1, '2022-10-06', 11)
insert into #x values (1, '2022-10-07', 10)

insert into #x values (2, '2022-10-01', 1)
insert into #x values (2, '2022-10-02', 0)
insert into #x values (2, '2022-10-05', 19)
insert into #x values (2, '2022-10-10', 18)
insert into #x values (2, '2022-10-12', 0)
insert into #x values (2, '2022-10-13', 0)
insert into #x values (2, '2022-10-15', 0)

insert into #x values (3, '2022-10-02', 10)
insert into #x values (3, '2022-10-03', 0)
insert into #x values (3, '2022-10-05', 0)
insert into #x values (3, '2022-10-06', 12)
insert into #x values (3, '2022-10-08', 0)

I see a few ways to do it. The brute force way is to to run the records through a cursor in date order, and grab the ID of any record where _value = 0 and see if it holds until the category changes. I'm trying to avoid T-SQL though if I can do it in a query.

To that end, I thought I could apply some gaps and islands trickery and do something with window functions. I feel like there might be a way to leverage last_value() for this, but so far I only see it useful in identifying partitions that have the criteria, not so much in helping me get the ID's of the records to delete.

The desired result is the deletion of records 10, 11, 12 and 17.

Appreciate any help.


Solution

  • Assuming that all _values are greater than or equal to 0 you can use MAX() window function in an updatable CTE:

    WITH cte AS (
      SELECT *, 
             MAX(_value) OVER (
               PARTITION BY _category 
               ORDER BY _date 
               ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
             ) max
      FROM #x
    )  
    DELETE FROM cte
    WHERE max = 0;
    

    If there are negative _values use MAX(ABS(_value)) instead of MAX(_value).

    See the demo.