Search code examples
sqlsql-servermultiple-columnsranksubsequence

How to rank subsequent data with gaps


I have this table that I want to rank and update the plan_number column. Using DENSE_RANK I have managed to group most of the data but some of the accounts have an amount that is the same before and after a gap they will be grouped together.

CREATE TABLE dbo.PlanItem(
    plan_account                INT
,   plan_sequence               INT
,   plan_number                 INT
,   plan_amount                 DECIMAL(16,2)                               
,   plan_due_date               DATETIME                                            
    )

INSERT INTO dbo.PlanItem VALUES(12159875,1223931,0,9334.00,'2014-07-31')
INSERT INTO dbo.PlanItem VALUES(12159875,1223932,0,160584.00,'2014-08-31')
INSERT INTO dbo.PlanItem VALUES(12159875,1223933,0,9334.00,'2014-09-30')
INSERT INTO dbo.PlanItem VALUES(12159875,1223934,0,9334.00,'2014-10-31')
INSERT INTO dbo.PlanItem VALUES(12159875,1223935,0,9334.00,'2014-11-30')
INSERT INTO dbo.PlanItem VALUES(12159875,1223936,0,9334.00,'2014-12-31')
INSERT INTO dbo.PlanItem VALUES(12159875,1223937,0,9334.00,'2015-01-31')
INSERT INTO dbo.PlanItem VALUES(12159875,1223938,0,9334.00,'2015-02-28')
INSERT INTO dbo.PlanItem VALUES(12159875,1223939,0,9334.00,'2015-03-31')

I would like this table to be ranked like this:

12159875,1223931,1,9334.00,'2014-07-31'
12159875,1223932,2,160584.00,'2014-08-31'
12159875,1223933,3,9334.00,'2014-09-30'
12159875,1223934,3,9334.00,'2014-10-31'
12159875,1223935,3,9334.00,'2014-11-30'
12159875,1223936,3,9334.00,'2014-12-31'
12159875,1223937,3,9334.00,'2015-01-31'
12159875,1223938,3,9334.00,'2015-02-28'
12159875,1223939,3,9334.00,'2015-03-31'

Grouping should be based on subsequent plan_amounts that are the same. And ordered oldest to newest os possible by lowest to highet plan_sequence.

I have tried a simple DENS_RANK which works fine for most of the data. But when the data has caps and the data before and after the cap are the same they will be grouped together.

WITH CTE_TEST
AS
(
SELECT *,
    DENSE_RANK() OVER (PARTITION BY plan_account ORDER BY plan_amount) AS Dense
FROM dbo.PlanItem
)

UPDATE dbo.PlanItem 
SET plan_number = Dense
FROM CTE_TEST
JOIN dbo.PlanItem ON dbo.PlanItem.plan_sequence = CTE_TEST.plan_sequence
GO

This is the output of select * from dbo.PlanItem order by plan_account

12159875,1223933,1,9334.00,'2014-09-30'
12159875,1223934,1,9334.00,'2014-10-31'
12159875,1223935,1,9334.00,'2014-11-30'
12159875,1223936,1,9334.00,'2014-12-31'
12159875,1223937,1,9334.00,'2015-01-31'
12159875,1223938,1,9334.00,'2015-02-28'
12159875,1223939,1,9334.00,'2015-03-31'
12159875,1223931,1,9334.00,'2014-07-31'
12159875,1223932,2,160584.00,'2014-08-31'

Solution

  • This is a type of gaps-and-islands problem.

    For this, peak at the previous value using lag() and then do a conditional cumulative sum:

    select pi.*,
           sum(case when prev_pa = plan_amount then 0 else 1 end) over (partition by plan_account order by plan_sequence) as your_rank
    from (select pi.*, lag(plan_amount) over (partition by plan_account order by plan_sequence) as prev_pa
          from planitem pi
         ) pi;
    

    Here is a db<>fiddle. Note that this uses SQL Server, which is consistent with your syntax but not with the tags.