I have a table with users, dates of the entry, a flag and the type of the entry. It looks like this.
User InDate Flag Type
1 2023-06-01 E A
2 2023-06-01 E AS
3 2023-06-01 E A
4 2023-06-01 I NULL
1 2023-03-01 E A
2 2023-03-01 E AS
3 2023-03-01 E A
4 2023-03-01 I AS
1 2022-12-01 I NULL
2 2022-12-01 E AS
3 2022-12-01 E A
4 2022-12-01 E AS
I am trying to find the count of users with consecutive "AS" type up until the current date. In this example let's say I am running on 6/30/2023. My thought was to use the lag function to get the previous entries value by user based on date, compare that to the current value and then assign it a 1 and sum these for each user. This is the code I've tried:
WITH Lagged AS (
SELECT [User],
Indate,
[Type],
Flag,
LAG([Type]) OVER (PARTITION BY [User] ORDER BY Indate) AS prev_val
FROM #MYtest
WHERE InDate >= '20220601'
and InDate < getdate())
SELECT [User], COUNT(*) as consecutive_count
FROM (
SELECT [User],
CASE WHEN [Type] = prev_val and [Type] like '%AS%' THEN 1 ELSE 0 END AS consecutive_indicator
FROM Lagged
) AS T
WHERE consecutive_indicator = 1
GROUP BY [User];
However I am getting the result:
User consecutive_count
2 2
4 1
What I am trying to get (expected result):
User consecutive_count
2 3
Because user 2 is the only user WITH a most recent type of "AS". Should I just add 1 to the final count and then join the Lagged Table back to the original table to check that the most recent status is "AS"? Is there a better way to do this?
Any tips or advice is greatly appreciated. Code to recreate the data I'm using to test in SQL Server:
CREATE TABLE #MYTest([User] int, InDate date,Flag varchar(1),Type varchar(2))
INSERT INTO #MYTest
values(1,'2023-06-01','E','A'),
(2,'2023-06-01','E','AS'),
(3,'2023-06-01','E','A'),
(4,'2023-06-01','I',NULL),
(1,'2023-03-01','E','A'),
(2,'2023-03-01','E','AS'),
(3,'2023-03-01','E','A'),
(4,'2023-03-01','I','AS'),
(1,'2022-12-01','I',NULL),
(2,'2022-12-01','E','AS'),
(3,'2022-12-01','E','A'),
(4,'2022-12-01','E','AS')
I also tried to join back to the original table to look only for those with a current status of 'AS' and adding 1 to all the counts to account for the current status, but ran into a corner case like this:
User,InDate,Flag,Type
5,2023-06-01,E,AS
5,2023-03-01,E,A
5,2022-12-01,E,AS
5,2022-09-01,E,AS
Which results in: User, consecutive_count 5, 2 Which is because the AS type for 12/1/2022 was the same as the 09/01/2022 flag resulting in 1 and then I added 1. Doh!
Just for clarification, I used the following criteria:
Based on that we just need to identify the most recent date where the %AS% criterion is not met and grab everything after that.
In the example below, this approach is used with the sample data set you provided, but also with an extended set that includes some other arrangements of the Type values to validate it works as expected.
WITH MostRecentExclusion AS --find the first instance where a Type is not like '%AS%'
(
SELECT [User],
max(Indate) AS ExclusionRowDate
FROM #MYTest
WHERE ISNULL([Type],'null') NOT LIKE '%AS%'
AND InDate >= '20220601'
AND InDate < getdate()
GROUP BY [User]
)
SELECT mt.[User], COUNT(*) as consecutive_count
FROM #MYTest mt
LEFT JOIN MostRecentExclusion mre
ON mt.[User] = mre.[User]
WHERE mt.Indate > ISNULL(mre.ExclusionRowDate,'01/01/1753 00:00:00') --just fill a NULL ExclusionRowID with a number unlikely to be reached
GROUP BY mt.[User];
Results with your sample data:
User consecutive_count
2 3
My sample data:
CREATE TABLE #MYTest([User] int, InDate date,Flag varchar(1),Type varchar(2))
INSERT INTO #MYTest
values(1,'2023-06-01','E','A'),
(2,'2023-06-01','E','AS'),
(3,'2023-06-01','E','A'),
(4,'2023-06-01','I',NULL),
(1,'2023-03-01','E','A'),
(2,'2023-03-01','E','AS'),
(3,'2023-03-01','E','A'),
(4,'2023-03-01','I','AS'),
(1,'2022-12-01','I',NULL),
(2,'2022-12-01','E','AS'),
(3,'2022-12-01','E','A'),
(4,'2022-12-01','E','AS'),
(5,'2023-06-01','I','AS'),
(5,'2023-03-01','I',NULL),
(5,'2022-12-01','I','AS'),
(6,'2023-06-01','I','AS'),
(6,'2023-03-01','I','A'),
(6,'2022-12-01','I','AS'),
(7,'2023-06-01','I','AS'),
(7,'2023-03-01','I','AS'),
(7,'2022-12-01','I',NULL)
;
Results with my sample data:
User consecutive_count
2 3
5 1
6 1
7 2