Search code examples
sqlsql-server

Merging sql row groups


My applications splits a single row data into different row chunks which are always in sorted order of startdate.

Where rowpart = 0 is the start and rowpart=2 is always the end rowpart=1 is the middle part,which can be repeated n no of times.

I need to return row in such form like startdate of rowpart=0 and enddate of rowpart=2(if present or else return enddate for rowpart )

  • Rowpart = 0 is the start of new row chunk
  • Rowpart = 2 is always the end of the chunk

Chunks can be spread across different dates.

+-----+-------------------------+-------------------------+----------+
| Id  |        startdate        |         enddate         | rowpart |
+-----+-------------------------+-------------------------+----------+
| 100 | 2016-11-30 00:00:00.000 | 2016-11-30 01:00:00.000 |        0 |
| 100 | 2016-11-30 02:00:00.000 | 2016-11-30 03:00:00.000 |        1 |
| 100 | 2016-11-30 10:00:00.000 | 2016-12-01 00:00:00.000 |        0 |
| 100 | 2016-12-01 02:00:00.000 | 2016-12-01 02:30:00.000 |        1 |
| 100 | 2016-12-01 10:00:00.000 | 2016-12-01 10:30:00.000 |        1 |
| 100 | 2016-12-01 16:00:00.000 | 2016-12-01 16:30:00.000 |        2 |
| 101 | 2016-12-11 10:00:00.000 | 2016-12-11 10:30:00.000 |        0 |
+-----+-------------------------+-------------------------+----------+

So the above table should return:

+-----+-------------------------+-------------------------+
| Id  |        startdate        |         enddate         |
+-----+-------------------------+-------------------------+
| 100 | 2016-11-30 00:00:00.000 | 2016-11-30 03:00:00.000 |
| 100 | 2016-12-30 10:00:00.000 | 2016-12-01 16:30:00.000 |
| 101 | 2016-12-11 10:00:00.000 | 2016-12-11 10:30:00.000 |
+-----+-------------------------+-------------------------+

Any help would be appreciated


Solution

  • This should work:

    ;WITH temp
    AS
    (
    SELECT  Id, startdate,enddate,rowpart,
            --Find out First Record
            CASE WHEN rowpart=0
                THEN 1
                ELSE 0
            END AS is_first,
            --Find out Last Record, Check if next rowpart is 0 or NULL:
            CASE WHEN COALESCE(LEAD(rowpart) OVER (ORDER BY Id, startdate),0) = 0 --Check if next rowpart is 0 or NULL
                THEN 1
                ELSE 0
            END AS is_last
    FROM    @tab
    )
    
    SELECT  DISTINCT
            Id,
            CASE WHEN is_first = 1
                THEN startdate
                ELSE LAG(startdate) OVER (ORDER BY Id, startdate) 
            END AS startdate,
            CASE WHEN is_last = 1
                THEN enddate
                ELSE LEAD(enddate) OVER (ORDER BY Id, startdate) 
            END AS enddate
    FROM    temp 
    WHERE   is_first = 1 OR is_last = 1
    ORDER BY Id, startdate
    

    What i try to do here: Inside the CTE i mark the first and the last record for each sequence. If rowpart=0 --> it's the first record. If the next record is null or the the rowpart of the next record is 0 then we have the last record.

    So when querying the CTE we can eliminate the "records in between". What remains are 1 or 2 records per sequence (the first and the last, in some cases this is the same record).

    Then we replace startdate with the startdate of the first record of the sequence and enddate with the enddate of the last record of the sequence.

    Eliminate duplicate values with DISTINCT and you get the desired output.

    This is a dirty piece of SQL, but at least it works ;-)

    If you didn't know SQL Servers LEAD and LAGfunction to access previous or following row values check this out: http://blog.sqlauthority.com/2013/09/22/sql-server-how-to-access-the-previous-row-and-next-row-value-in-select-statement/