Search code examples
sqlsql-servert-sqlsql-server-2017

Identify Consecutive Chunks in SQL Server Table


I have this table:

ValueId bigint // (identity) item ID
ListId bigint // group ID
ValueDelta int // item value
ValueCreated datetime2 // item created

What I need is to find consecutive Values within the same Group ordered by Created, not ID. Created and ID are not guaranteed to be in the same order.

So the output should be:

ListID bigint
FirstId bigint // from this ID (first in LID with Value ordered by Date)
LastId bigint // to this ID (last in LID with Value ordered by Date)
ValueDelta int // all share this value
ValueCount // and this many occurrences (number of items between FirstId and LastId)

I can do this with Cursors but I'm sure that's not the best idea so I'm wondering if this can be done in a query.

Please, for the answer (if any), explain it a bit.

UPDATE: SQLfiddle basic data set


Solution

  • It does look like a gaps-and-island problem.

    Here is one way to do it. It would likely work faster than your variant.

    The standard idea for gaps-and-islands is to generate two sets of row numbers partitioning them in two ways. The difference between such row numbers (rn1-rn2) would remain the same within each consecutive chunk. Run the query below CTE-by-CTE and examine intermediate results to see what is going on.

    WITH
    CTE_RN
    AS
    (
        SELECT
            [ValueId]
            ,[ListId]
            ,[ValueDelta]
            ,[ValueCreated]
            ,ROW_NUMBER() OVER (PARTITION BY ListID ORDER BY ValueCreated) AS rn1
            ,ROW_NUMBER() OVER (PARTITION BY ListID, [ValueDelta] ORDER BY ValueCreated) AS rn2
        FROM [Value]
    )
    SELECT
        ListID
        ,MIN(ValueID) AS FirstID
        ,MAX(ValueID) AS LastID
        ,MIN(ValueCreated) AS FirstCreated
        ,MAX(ValueCreated) AS LastCreated
        ,ValueDelta
        ,COUNT(*) AS ValueCount
    FROM CTE_RN
    GROUP BY
        ListID
        ,ValueDelta
        ,rn1-rn2
    ORDER BY
        FirstCreated
    ;
    

    This query produces the same result as yours on your sample data set.

    It is not quite clear whether FirstID and LastID can be MIN and MAX, or they indeed must be from the first and last rows (when ordered by ValueCreated). If you need really first and last, the query would become a bit more complicated.


    In your original sample data set the "first" and "min" for the FirstID are the same. Let's change the sample data set a little to highlight this difference:

    insert into [Value]
    ([ListId], [ValueDelta], [ValueCreated])
    values
    (1, 1, '2019-01-01 01:01:02'), -- 1.1
    (1, 0, '2019-01-01 01:02:01'), -- 2.1
    (1, 0, '2019-01-01 01:03:01'), -- 2.2
    (1, 0, '2019-01-01 01:04:01'), -- 2.3
    (1, -1, '2019-01-01 01:05:01'), -- 3.1
    (1, -1, '2019-01-01 01:06:01'), -- 3.2
    (1, 1, '2019-01-01 01:01:01'), -- 1.2
    (1, 1, '2019-01-01 01:08:01'), -- 4.2
    (2, 1, '2019-01-01 01:08:01') -- 5.1
    ;
    

    All I did is swapped the ValueCreated between the first and seventh rows, so now the FirstID of the first group is 7 and LastID is 1. Your query returns correct result. My simple query above doesn't.

    Here is the variant that produces correct result. I decided to use FIRST_VALUE and LAST_VALUE functions to get the appropriate IDs. Again, run the query CTE-by-CTE and examine intermediate results to see what is going on. This variant produces the same result as your query even with the adjusted sample data set.

    WITH
    CTE_RN
    AS
    (
        SELECT
            [ValueId]
            ,[ListId]
            ,[ValueDelta]
            ,[ValueCreated]
            ,ROW_NUMBER() OVER (PARTITION BY ListID ORDER BY ValueCreated) AS rn1
            ,ROW_NUMBER() OVER (PARTITION BY ListID, ValueDelta ORDER BY ValueCreated) AS rn2
        FROM [Value]
    )
    ,CTE2
    AS
    (
        SELECT
            ValueId
            ,ListId
            ,ValueDelta
            ,ValueCreated
            ,rn1
            ,rn2
            ,rn1-rn2 AS Diff
            ,FIRST_VALUE(ValueID) OVER(
                PARTITION BY ListID, ValueDelta, rn1-rn2 ORDER BY ValueCreated
                ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS FirstID
            ,LAST_VALUE(ValueID) OVER(
                PARTITION BY ListID, ValueDelta, rn1-rn2 ORDER BY ValueCreated
                ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastID
        FROM CTE_RN
    )
    SELECT
        ListID
        ,FirstID
        ,LastID
        ,MIN(ValueCreated) AS FirstCreated
        ,MAX(ValueCreated) AS LastCreated
        ,ValueDelta
        ,COUNT(*) AS ValueCount
    FROM CTE2
    GROUP BY
        ListID
        ,ValueDelta
        ,rn1-rn2
        ,FirstID
        ,LastID
    ORDER BY FirstCreated;