Search code examples
sqlsql-servert-sqlwindow-functionsgaps-and-islands

Calculating a running total of when a value changes over a partition


I am having trouble figuring out how to write a window function that solves my problem. I am quite the novice at window functions, but I think one could be written to meet my needs.

Problem Statement:
I want to calculate a transfer sequence showing when person has changed locations based on the corresponding location ID over time.

Sample Data (Table1)

+----------+------------+-----------+---------+
| PersonID | LocationID | Date      | Time    |
+----------+------------+-----------+---------+
| 12       | A          | 6/17/2020 | 12:00PM |
+----------+------------+-----------+---------+
| 12       | A          | 6/18/2020 | 1:00PM  |
+----------+------------+-----------+---------+
| 12       | B          | 6/18/2020 | 6:00AM  |
+----------+------------+-----------+---------+
| 12       | C          | 6/19/2020 | 3:00PM  |
+----------+------------+-----------+---------+
| 13       | A          | 6/16/2020 | 8:00AM  |
+----------+------------+-----------+---------+
| 13       | A          | 6/16/2020 | 11:00AM |
+----------+------------+-----------+---------+
| 13       | A          | 6/16/2020 | 12:00AM |
+----------+------------+-----------+---------+
| 13       | B          | 6/16/2020 | 4:00PM  |
+----------+------------+-----------+---------+

Expected Results

+----------+------------+-----------+---------+-------------------+
| PersonID | LocationID | Date      | Time    | Transfer Sequence |
+----------+------------+-----------+---------+-------------------+
| 12       | A          | 6/17/2020 | 12:00PM | 1                 |
+----------+------------+-----------+---------+-------------------+
| 12       | A          | 6/18/2020 | 1:00PM  | 1                 |
+----------+------------+-----------+---------+-------------------+
| 12       | B          | 6/18/2020 | 6:00AM  | 2                 |
+----------+------------+-----------+---------+-------------------+
| 12       | C          | 6/19/2020 | 3:00PM  | 3                 |
+----------+------------+-----------+---------+-------------------+
| 13       | A          | 6/16/2020 | 8:00AM  | 1                 |
+----------+------------+-----------+---------+-------------------+
| 13       | A          | 6/16/2020 | 11:00AM | 1                 |
+----------+------------+-----------+---------+-------------------+
| 13       | A          | 6/16/2020 | 12:00AM | 1                 |
+----------+------------+-----------+---------+-------------------+
| 13       | B          | 6/16/2020 | 4:00PM  | 2                 |
+----------+------------+-----------+---------+-------------------+

What I Tried

SELECT 
     [t1].[PersonID]
    ,[t1].[LocationID]
    ,[t1].[Date]
    ,[t1].[Time]
    ,DENSE_RANK() 
         OVER( 
           partition BY [t1].[PersonID], [t1].[LocationID] 
           ORDER BY [t1].[Date] ASC, [t1].[Time] ASC) AS 
       [Transfer Sequence]


FROM Table1 [t1]

Unfortunately, I believe DENSE_RANK() is assigning a rank regardless of whether the value of LocationID has changed. I need a function that will only add one to the sequence when the LocationID has changed.

Any help would be greatly appreciated.

Thank you!


Solution

  • You want to put "adjacent" rows in the same group. Straigt window functions cannot do that for you - we would need to use a gaps-and-island technique:

    select 
        t.*, 
        sum(case when locationID = lagLocationID then 0 else 1 end) 
            over(partition by personID order by date, time) 
            as transfert_sequence
    from (
        select 
            t.*, 
            lag(locationID) 
                over(partition by personID order by date, time) 
                as lagLocationID
        from mytable t
    ) t
    

    The idea is to compute a window sum that increments everytime the locationID changes.

    Note that this would properly handle the case when a person comes back to a location they have already been before.