Search code examples
sqlnetezza

SQL for deleting records which are duplicate(consecutive), But storing there min date in Start date and max date as End Date


I have below input data in a Sample table:

S_ID  C_ID  E_ID   ST_DT        ED_DT
100   A     11AS   01/01/2020   05/01/2020
100   A     11AS   06/01/2020   10/01/2020
100   A     11AS   11/01/2020   15/01/2020
100   A     11BT   16/01/2020   20/01/2020
100   A     11AS   21/01/2020   27/01/2020
100   A     11AS   28/01/2020   30/01/2020

Expected Output in below table:

S_ID  C_ID  E_ID   ST_DT        ED_DT
100   A     11AS   01/01/2020   15/01/2020
100   A     11BT   16/01/2020   20/01/2020
100   A     11AS   21/01/2020   30/01/2020

Database: Netezza Note: These are sample records from data. There are other E_ID in the table's as well.

Thanks


Solution

  • This is a gaps-and-islands problem. Assuming you have no gaps, a simple way is the difference of row numbers:

    select s_id, c_id, e_id, min(st_dt), max(ed_dt)
    from (select t.*,
                 row_number() over (partition by s_id, c_id order by st_dt) as seqnum,
                 row_number() over (partition by s_id, c_id, e_id order by st_dt) as seqnum_2
          from t
         ) t
    group by s_id, c_id, e_id, (seqnum - seqnum_2);