I have this table:
ValueId bigint // (identity) item ID
ListId bigint // group ID
ValueDelta int // item value
ValueCreated datetime2 // item created
What I need is to find consecutive Values within the same Group ordered by Created, not ID. Created and ID are not guaranteed to be in the same order.
So the output should be:
ListID bigint
FirstId bigint // from this ID (first in LID with Value ordered by Date)
LastId bigint // to this ID (last in LID with Value ordered by Date)
ValueDelta int // all share this value
ValueCount // and this many occurrences (number of items between FirstId and LastId)
I can do this with Cursors but I'm sure that's not the best idea so I'm wondering if this can be done in a query.
Please, for the answer (if any), explain it a bit.
UPDATE: SQLfiddle basic data set
It does look like a gaps-and-island problem.
Here is one way to do it. It would likely work faster than your variant.
The standard idea for gaps-and-islands is to generate two sets of row numbers partitioning them in two ways. The difference between such row numbers (rn1-rn2
) would remain the same within each consecutive chunk. Run the query below CTE-by-CTE and examine intermediate results to see what is going on.
WITH
CTE_RN
AS
(
SELECT
[ValueId]
,[ListId]
,[ValueDelta]
,[ValueCreated]
,ROW_NUMBER() OVER (PARTITION BY ListID ORDER BY ValueCreated) AS rn1
,ROW_NUMBER() OVER (PARTITION BY ListID, [ValueDelta] ORDER BY ValueCreated) AS rn2
FROM [Value]
)
SELECT
ListID
,MIN(ValueID) AS FirstID
,MAX(ValueID) AS LastID
,MIN(ValueCreated) AS FirstCreated
,MAX(ValueCreated) AS LastCreated
,ValueDelta
,COUNT(*) AS ValueCount
FROM CTE_RN
GROUP BY
ListID
,ValueDelta
,rn1-rn2
ORDER BY
FirstCreated
;
This query produces the same result as yours on your sample data set.
It is not quite clear whether FirstID
and LastID
can be MIN
and MAX
, or they indeed must be from the first and last rows (when ordered by ValueCreated). If you need really first and last, the query would become a bit more complicated.
In your original sample data set the "first" and "min" for the FirstID
are the same. Let's change the sample data set a little to highlight this difference:
insert into [Value]
([ListId], [ValueDelta], [ValueCreated])
values
(1, 1, '2019-01-01 01:01:02'), -- 1.1
(1, 0, '2019-01-01 01:02:01'), -- 2.1
(1, 0, '2019-01-01 01:03:01'), -- 2.2
(1, 0, '2019-01-01 01:04:01'), -- 2.3
(1, -1, '2019-01-01 01:05:01'), -- 3.1
(1, -1, '2019-01-01 01:06:01'), -- 3.2
(1, 1, '2019-01-01 01:01:01'), -- 1.2
(1, 1, '2019-01-01 01:08:01'), -- 4.2
(2, 1, '2019-01-01 01:08:01') -- 5.1
;
All I did is swapped the ValueCreated between the first and seventh rows, so now the FirstID
of the first group is 7
and LastID
is 1
. Your query returns correct result. My simple query above doesn't.
Here is the variant that produces correct result. I decided to use FIRST_VALUE
and LAST_VALUE
functions to get the appropriate IDs. Again, run the query CTE-by-CTE and examine intermediate results to see what is going on.
This variant produces the same result as your query even with the adjusted sample data set.
WITH
CTE_RN
AS
(
SELECT
[ValueId]
,[ListId]
,[ValueDelta]
,[ValueCreated]
,ROW_NUMBER() OVER (PARTITION BY ListID ORDER BY ValueCreated) AS rn1
,ROW_NUMBER() OVER (PARTITION BY ListID, ValueDelta ORDER BY ValueCreated) AS rn2
FROM [Value]
)
,CTE2
AS
(
SELECT
ValueId
,ListId
,ValueDelta
,ValueCreated
,rn1
,rn2
,rn1-rn2 AS Diff
,FIRST_VALUE(ValueID) OVER(
PARTITION BY ListID, ValueDelta, rn1-rn2 ORDER BY ValueCreated
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS FirstID
,LAST_VALUE(ValueID) OVER(
PARTITION BY ListID, ValueDelta, rn1-rn2 ORDER BY ValueCreated
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastID
FROM CTE_RN
)
SELECT
ListID
,FirstID
,LastID
,MIN(ValueCreated) AS FirstCreated
,MAX(ValueCreated) AS LastCreated
,ValueDelta
,COUNT(*) AS ValueCount
FROM CTE2
GROUP BY
ListID
,ValueDelta
,rn1-rn2
,FirstID
,LastID
ORDER BY FirstCreated;