Search code examples
sqlsql-serversql-server-2016

Rank data with history of recurrence


Lets say I'm tracking a user's location, and I capture the following information:

  1. Date
  2. UserId
  3. CurrentLocation

I can do a fairly easy transformation on this data to form a new table to get their last known position, if one exists, which I'll include below to show us moving from one point to another.

I want to create a grouping for each user's current position, and increment it whenever their position has changed from a previous value. If a user leaves a position, and later comes back to it, I want that to be treated as a new value, and not lump it in with the group when they were first at this location.

The problem with using RANK or DENSE_RANK to do this is that I'm ordering by the currentPos, which obviously won't work.

I thought I could use LAG() to look at the previous data, but this doesn't allow you to aggregate the previous record's LAG() with the current row.

Here's an example using RANK()

WITH dummyData(id, occuredOn, userId, currentPos, lastPos) AS (
    SELECT 01, '2021-01-01 00:00:00', 23, 'A', null
    UNION ALL
    SELECT 22, '2021-01-01 01:30:00', 23, 'A', 'A'
    UNION ALL
    SELECT 43, '2021-01-01 04:00:00', 23, 'B', 'A'
    UNION ALL
    SELECT 55, '2021-01-02 00:00:00', 23, 'C', 'B'
    UNION ALL
    SELECT 59, '2021-01-02 04:40:00', 23, 'B', 'C'
    UNION ALL
    SELECT 68, '2021-01-02 08:00:00', 23, 'C', 'B'
    UNION ALL
    SELECT 69, '2021-01-02 09:00:00', 23, 'D', 'C'

    UNION ALL
    SELECT 11, '2021-01-01 01:00:00', 43, 'X', 'X'
    UNION ALL
    SELECT 18, '2021-01-01 02:00:00', 43, 'Y', 'X'
    UNION ALL
    SELECT 32, '2021-01-02 00:00:00', 43, 'Z', 'Y'
)
SELECT *
    , DENSE_RANK() OVER (PARTITION BY userId ORDER BY currentPos) locationChangeGroup
FROM dummyData
ORDER BY userId ASC, occuredOn ASC

Here's what it outputs

id occurredOn userId currentPos lastPos locationChangeGroup
01 2021-01-01 00:00:00 23 A NULL 1
22 2021-01-01 01:30:00 23 A A 1
43 2021-01-01 04:00:00 23 B A 2
55 2021-01-02 00:00:00 23 C B 3
59 2021-01-02 04:40:00 23 B C 2
68 2021-01-02 08:00:00 23 C B 3
69 2021-01-02 09:00:00 23 D C 4
11 2021-01-01 01:00:00 43 X X 1
18 2021-01-01 02:00:00 43 Y X 2
32 2021-01-02 00:00:00 43 Z Y 3

Here's what I want

id occurredOn userId currentPos lastPos locationChangeGroup
01 2021-01-01 00:00:00 23 A NULL 1
22 2021-01-01 01:30:00 23 A A 1
43 2021-01-01 04:00:00 23 B A 2
55 2021-01-02 00:00:00 23 C B 3
59 2021-01-02 04:40:00 23 B C 4
68 2021-01-02 08:00:00 23 C B 5
69 2021-01-02 09:00:00 23 D C 6
11 2021-01-01 01:00:00 43 X X 1
18 2021-01-01 02:00:00 43 Y X 2
32 2021-01-02 00:00:00 43 Z Y 3

I know I could do this with a CURSOR, but I'd rather not resort to that.

T-SQL is fine, but I'm trying to stay away from any stored procs or functions, as it will require a larger effort of generating database migration scripts and the rigamarole of our processes that entails.

Any suggestions?


Solution

  • I think this is a gap-and-islands problem. For this purpose, you can use lag() and a cumulative sum:

    select dd.*,
           sum(case when prev_currentpos = currentpos then 0 else 1 end) over
                (partition by userid
                 order by occurredon
                ) as locationChangeGroup
    from (select dd.*,
                 lag(currentpos) over (partition by userid order by occurredon) as prev_currentpos
          from dummydata dd
         ) dd