Search code examples
sql-server-2014data-partitioninglead

Date difference over consecutive rows filtered to one instance of consecutive values using LEAD function


Basically the result i need is price change ranges for each item, i need to extract an item price and the date of the transaction with an end date set to the next time that item price is changed.

Given this table

create table myTable(id int, Price decimal(10,6), StartDate datetime)
insert into myTable
select 1, 92.576842, '2018-04-06 23:00:00.000' union all
select 1, 92.700000, '2018-04-12 21:39:00.000' union all
select 1, 92.700000, '2018-04-26 00:01:00.000' union all
select 1, 92.700000, '2018-06-04 23:20:00.000' union all
select 1, 116.700000, '2018-07-04 21:38:00.000' union all
select 1, 116.700000, '2018-08-01 22:31:31.000' union all
select 1, 118.610597, '2018-08-13 23:34:22.000' union all
select 1, 116.700000, '2018-09-05 23:52:28.000'

And this query

select id, Price, StartDate, LEAD(StartDate) over (partition by id order by id, StartDate) endDate
from myTable
order by id, StartDate

Query Results:

id  Price       StartDate               endDate
1   92.576842   2018-04-06 23:00:00.000 2018-04-12 21:39:00.000
1   92.700000   2018-04-12 21:39:00.000 2018-04-26 00:01:00.000
1   92.700000   2018-04-26 00:01:00.000 2018-06-04 23:20:00.000
1   92.700000   2018-06-04 23:20:00.000 2018-07-04 21:38:00.000
1   116.700000  2018-07-04 21:38:00.000 2018-08-01 22:31:31.000
1   116.700000  2018-08-01 22:31:31.000 2018-08-13 23:34:22.000
1   118.610597  2018-08-13 23:34:22.000 2018-09-05 23:52:28.000
1   116.700000  2018-09-05 23:52:28.000 NULL

how to get only the earliest date for consecutive Price instance

the result should be as below - note that last row is a duplicated Price yet it is required

id  Price       StartDate               EndDate
1   92.576842   2018-04-06 23:00:00.000 2018-04-12 21:39:00.000
1   92.700000   2018-04-12 21:39:00.000 2018-07-04 21:38:00.000
1   116.700000  2018-07-04 21:38:00.000 2018-08-13 23:34:22.000
1   118.610597  2018-08-13 23:34:22.000 2018-09-05 23:52:28.000
1   116.700000  2018-09-05 23:52:28.000 NULL

Solution

  • This an approach for "islands" that employs 2 row_number() calculations that give all rows in an "island" a shared id, then it is possible to group by that id:

    select
        id, Price, group_id, min(StartDate) StartDate, max(enddate) enddate
    from (
            select id, Price, StartDate
                , LEAD(StartDate,1,DATEADD(YEAR,1,StartDate)) over (partition by id order by id,StartDate) endDate
                , row_number() over(partition by id order by StartDate ASC) 
                  - row_number() over(partition by id, price order by StartDate ASC) AS group_id
            from myTable
          ) d
    group by
        id, Price, group_id
    order by
        id,StartDate
    ;
    
    | id |      Price | group_id |            StartDate |              enddate |
    |----|------------|----------|----------------------|----------------------|
    |  1 |  92.576842 |        0 | 2018-04-06T23:00:00Z | 2018-04-12T21:39:00Z |
    |  1 |       92.7 |        1 | 2018-04-12T21:39:00Z | 2018-07-04T21:38:00Z |
    |  1 |      116.7 |        4 | 2018-07-04T21:38:00Z | 2018-08-13T23:34:22Z |
    |  1 | 118.610597 |        6 | 2018-08-13T23:34:22Z | 2018-09-05T23:52:28Z |
    |  1 |      116.7 |        5 | 2018-09-05T23:52:28Z | 2019-09-05T23:52:28Z |
    

    http://sqlfiddle.com/#!18/293d1/6

    If you want NULL as the last enddate, change the lead() so it does not supply a default value

    LEAD(StartDate,1)) over (partition by id order by id,StartDate) endDate