sql sql-server t-sql join full-outer-join

How to loop through data and compare previous batch to current batch in SQL Server?

Every time we receive a new batch of data, we load it into a table that stores every dataset we have previously received. Each new batch is given an new batch_id. The only two other columns are item_id and cost. I want to build a report that will describe how many new items were added (based on item_id), how many items were removed, and the net change in items in the newest batch. With this in mind, I want to compare all batches against each other but only such that the higher batch_id compares the batch_id just before it. To clarify, if I have 3 batch_id's then I want batch 3 to compare to batch 2 and batch 2 to compare to batch 1. Batch 1 would have nothing to compare against. I am able to do this with the script below but only for the batch that I specify (second row in results), and with hundreds of batches this would be mind numbing. Is there a way to make this more dynamic in SQL Server? I provided a sample dataset and anticipated results below.

WITH b1
     AS (SELECT *
         FROM example_cumulative
         WHERE batch_id = 2),
     b2
     AS (SELECT *
         FROM example_cumulative
         WHERE batch_id = 3)
     SELECT SUM(CASE
                    WHEN b1.item_id IS NULL
                    THEN 1
                    ELSE 0
                END) [new items], 
            SUM(CASE
                    WHEN b2.item_id IS NULL
                    THEN 1
                    ELSE 0
                END) [removed items], 
            COUNT(b2.item_id) - COUNT(b1.item_id) [net change in item volume]
     FROM b1
          FULL OUTER JOIN b2 ON b1.item_id = b2.item_id;

anticipated result

CREATE TABLE example_cumulative
(batch_id INT NOT NULL, 
 item_id  INT NOT NULL, 
 cost     FLOAT NOT NULL
);
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,10,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,11,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,12,106)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,13,142)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,14,152)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,15,154)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,10,110)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,11,140)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,12,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,16,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,10,110)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,11,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,16,195)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,17,102)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,18,101)

Solution

You could extend your current full join logic as follows by dynamically joining on consecutive batch_ids, like so:

select
    coalesce(e1.batch_id, e0.batch_id + 1) batch_id,
    sum(case when e0.item_id is null then 1 else 0 end) new_items,
    sum(case when e1.item_id is null then 1 else 0 end) removed_items,
    count(e1.item_id) - count(e0.item_id) net_change
from example_cumulative e1
full join example_cumulative e0 
    on  e1.item_id  = e0.item_id
    and e1.batch_id = e0.batch_id + 1
where coalesce(e1.batch_id, e0.batch_id + 1) <= (select max(batch_id) from example_cumulative)
group by coalesce(e1.batch_id, e0.batch_id + 1)

The where clause is there to prevent the query to return an additional, unwanted record whose batch_id is greater than the maximum available value.

For your sample data, this returns:

batch_id | new_items | removed_items | net_change
-------: | --------: | ------------: | ---------:
       1 |         6 |             0 |          6
       2 |         1 |             3 |         -2
       3 |         2 |             1 |          1