Search code examples
sqlsql-serverwindow-functions

Trying to count consecutive values for user in slow changing dimension table if most recent record has a certain value


I have a table with users, dates of the entry, a flag and the type of the entry. It looks like this.

User InDate   Flag  Type
1   2023-06-01  E   A
2   2023-06-01  E   AS
3   2023-06-01  E   A
4   2023-06-01  I   NULL
1   2023-03-01  E   A
2   2023-03-01  E   AS
3   2023-03-01  E   A
4   2023-03-01  I   AS
1   2022-12-01  I   NULL
2   2022-12-01  E   AS
3   2022-12-01  E   A
4   2022-12-01  E   AS

I am trying to find the count of users with consecutive "AS" type up until the current date. In this example let's say I am running on 6/30/2023. My thought was to use the lag function to get the previous entries value by user based on date, compare that to the current value and then assign it a 1 and sum these for each user. This is the code I've tried:

WITH Lagged AS (
    SELECT [User],
           Indate,
           [Type],
           Flag,
           LAG([Type]) OVER (PARTITION BY [User] ORDER BY Indate) AS prev_val
    FROM #MYtest
    WHERE InDate >= '20220601'
    and InDate < getdate())

SELECT [User], COUNT(*)  as consecutive_count
FROM (
    SELECT [User],
           CASE WHEN [Type] = prev_val and [Type] like '%AS%' THEN 1 ELSE 0 END AS consecutive_indicator
    FROM Lagged
) AS T
WHERE consecutive_indicator = 1
GROUP BY [User];

However I am getting the result:

User consecutive_count
2   2
4   1

What I am trying to get (expected result):

User consecutive_count
2    3

Because user 2 is the only user WITH a most recent type of "AS". Should I just add 1 to the final count and then join the Lagged Table back to the original table to check that the most recent status is "AS"? Is there a better way to do this?

Any tips or advice is greatly appreciated. Code to recreate the data I'm using to test in SQL Server:

CREATE TABLE #MYTest([User] int, InDate date,Flag varchar(1),Type  varchar(2))
INSERT INTO #MYTest
values(1,'2023-06-01','E','A'),    
(2,'2023-06-01','E','AS'),    
(3,'2023-06-01','E','A'),    
(4,'2023-06-01','I',NULL),    
(1,'2023-03-01','E','A'),    
(2,'2023-03-01','E','AS'),    
(3,'2023-03-01','E','A'),    
(4,'2023-03-01','I','AS'),
(1,'2022-12-01','I',NULL),    
(2,'2022-12-01','E','AS'),    
(3,'2022-12-01','E','A'),    
(4,'2022-12-01','E','AS')

I also tried to join back to the original table to look only for those with a current status of 'AS' and adding 1 to all the counts to account for the current status, but ran into a corner case like this:

User,InDate,Flag,Type
5,2023-06-01,E,AS
5,2023-03-01,E,A
5,2022-12-01,E,AS
5,2022-09-01,E,AS

Which results in: User, consecutive_count 5, 2 Which is because the AS type for 12/1/2022 was the same as the 09/01/2022 flag resulting in 1 and then I added 1. Doh!


Solution

  • Just for clarification, I used the following criteria:

    • The most recent Type for the user needs to be %AS%
      • User 4 doesn't make the cut because the most recent Type is NULL
    • Get the count of all recent consecutive Type values that are %AS% until the first break from that criterion

    Based on that we just need to identify the most recent date where the %AS% criterion is not met and grab everything after that.

    In the example below, this approach is used with the sample data set you provided, but also with an extended set that includes some other arrangements of the Type values to validate it works as expected.

    SQL Fiddle example

    WITH MostRecentExclusion AS --find the first instance where a Type is not like '%AS%'
        (
            SELECT [User], 
                max(Indate) AS ExclusionRowDate
            FROM #MYTest 
            WHERE ISNULL([Type],'null') NOT LIKE '%AS%'
            AND InDate >= '20220601'
            AND InDate < getdate()
            GROUP BY [User]
        )
    SELECT mt.[User], COUNT(*)  as consecutive_count
    FROM #MYTest mt
    LEFT JOIN MostRecentExclusion mre
        ON mt.[User] = mre.[User]
    WHERE mt.Indate > ISNULL(mre.ExclusionRowDate,'01/01/1753 00:00:00') --just fill a NULL ExclusionRowID with a number unlikely to be reached
    GROUP BY mt.[User];
    

    Results with your sample data:

    User    consecutive_count
    2         3
    

    My sample data:

    CREATE TABLE #MYTest([User] int, InDate date,Flag varchar(1),Type  varchar(2))
    
    INSERT INTO #MYTest
    values(1,'2023-06-01','E','A'),    
    (2,'2023-06-01','E','AS'),    
    (3,'2023-06-01','E','A'),    
    (4,'2023-06-01','I',NULL),    
    (1,'2023-03-01','E','A'),    
    (2,'2023-03-01','E','AS'),    
    (3,'2023-03-01','E','A'),    
    (4,'2023-03-01','I','AS'),
    (1,'2022-12-01','I',NULL),    
    (2,'2022-12-01','E','AS'),    
    (3,'2022-12-01','E','A'),    
    (4,'2022-12-01','E','AS'),
    (5,'2023-06-01','I','AS'),
    (5,'2023-03-01','I',NULL),
    (5,'2022-12-01','I','AS'),
    (6,'2023-06-01','I','AS'),
    (6,'2023-03-01','I','A'),
    (6,'2022-12-01','I','AS'),
    (7,'2023-06-01','I','AS'),
    (7,'2023-03-01','I','AS'),
    (7,'2022-12-01','I',NULL)
    ;
    

    Results with my sample data:

    User    consecutive_count
    2        3
    5        1
    6        1
    7        2