Lets say I'm tracking a user's location, and I capture the following information:
I can do a fairly easy transformation on this data to form a new table to get their last known position, if one exists, which I'll include below to show us moving from one point to another.
I want to create a grouping for each user's current position, and increment it whenever their position has changed from a previous value. If a user leaves a position, and later comes back to it, I want that to be treated as a new value, and not lump it in with the group when they were first at this location.
The problem with using RANK or DENSE_RANK to do this is that I'm ordering by the currentPos, which obviously won't work.
I thought I could use LAG() to look at the previous data, but this doesn't allow you to aggregate the previous record's LAG() with the current row.
Here's an example using RANK()
WITH dummyData(id, occuredOn, userId, currentPos, lastPos) AS (
SELECT 01, '2021-01-01 00:00:00', 23, 'A', null
UNION ALL
SELECT 22, '2021-01-01 01:30:00', 23, 'A', 'A'
UNION ALL
SELECT 43, '2021-01-01 04:00:00', 23, 'B', 'A'
UNION ALL
SELECT 55, '2021-01-02 00:00:00', 23, 'C', 'B'
UNION ALL
SELECT 59, '2021-01-02 04:40:00', 23, 'B', 'C'
UNION ALL
SELECT 68, '2021-01-02 08:00:00', 23, 'C', 'B'
UNION ALL
SELECT 69, '2021-01-02 09:00:00', 23, 'D', 'C'
UNION ALL
SELECT 11, '2021-01-01 01:00:00', 43, 'X', 'X'
UNION ALL
SELECT 18, '2021-01-01 02:00:00', 43, 'Y', 'X'
UNION ALL
SELECT 32, '2021-01-02 00:00:00', 43, 'Z', 'Y'
)
SELECT *
, DENSE_RANK() OVER (PARTITION BY userId ORDER BY currentPos) locationChangeGroup
FROM dummyData
ORDER BY userId ASC, occuredOn ASC
Here's what it outputs
id | occurredOn | userId | currentPos | lastPos | locationChangeGroup |
---|---|---|---|---|---|
01 | 2021-01-01 00:00:00 | 23 | A | NULL | 1 |
22 | 2021-01-01 01:30:00 | 23 | A | A | 1 |
43 | 2021-01-01 04:00:00 | 23 | B | A | 2 |
55 | 2021-01-02 00:00:00 | 23 | C | B | 3 |
59 | 2021-01-02 04:40:00 | 23 | B | C | 2 |
68 | 2021-01-02 08:00:00 | 23 | C | B | 3 |
69 | 2021-01-02 09:00:00 | 23 | D | C | 4 |
11 | 2021-01-01 01:00:00 | 43 | X | X | 1 |
18 | 2021-01-01 02:00:00 | 43 | Y | X | 2 |
32 | 2021-01-02 00:00:00 | 43 | Z | Y | 3 |
Here's what I want
id | occurredOn | userId | currentPos | lastPos | locationChangeGroup |
---|---|---|---|---|---|
01 | 2021-01-01 00:00:00 | 23 | A | NULL | 1 |
22 | 2021-01-01 01:30:00 | 23 | A | A | 1 |
43 | 2021-01-01 04:00:00 | 23 | B | A | 2 |
55 | 2021-01-02 00:00:00 | 23 | C | B | 3 |
59 | 2021-01-02 04:40:00 | 23 | B | C | 4 |
68 | 2021-01-02 08:00:00 | 23 | C | B | 5 |
69 | 2021-01-02 09:00:00 | 23 | D | C | 6 |
11 | 2021-01-01 01:00:00 | 43 | X | X | 1 |
18 | 2021-01-01 02:00:00 | 43 | Y | X | 2 |
32 | 2021-01-02 00:00:00 | 43 | Z | Y | 3 |
I know I could do this with a CURSOR, but I'd rather not resort to that.
T-SQL is fine, but I'm trying to stay away from any stored procs or functions, as it will require a larger effort of generating database migration scripts and the rigamarole of our processes that entails.
Any suggestions?
I think this is a gap-and-islands problem. For this purpose, you can use lag()
and a cumulative sum:
select dd.*,
sum(case when prev_currentpos = currentpos then 0 else 1 end) over
(partition by userid
order by occurredon
) as locationChangeGroup
from (select dd.*,
lag(currentpos) over (partition by userid order by occurredon) as prev_currentpos
from dummydata dd
) dd