sql sql-server select aggregate-functions

Counting sequenced records in ranges, based on grouped varchar values

I'm trying to calculate count of same values, in a certain pattern, where the pattern might occur again.

To simply, please see below data, and expected result. Solution is implemented in SQL Server 2016 and above ...

DECLARE @dataTable TABLE
                   (
                       ts DATETIME, 
                       prop1 VARCHAR(4), 
                       prop2 VARCHAR(2)
                   );

INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:51:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:50:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:49:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:48:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:47:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:46:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:45:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:44:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:43:00', 'AAAA', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:32:00', 'AAAA', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:41:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:40:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:39:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:38:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:37:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:36:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:35:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:34:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:33:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:32:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:31:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:30:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:29:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:28:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:27:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:26:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:25:00', 'CCCC', '02');

SELECT 
    *, 
    MAX(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC), 
    COUNT(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC )
FROM 
    @dataTable 
ORDER BY 
    ts DESC;

The blocks are sequenced based on ts column.

The expected result should look like:

MAX_OF_PROP1_IN_BLOCK	PROP1	PROP2	COUNT_PROP1_IN_BLOCK	COUNT_OF_PROP2_IN_BLOCK
2023-05-01 10:51:00	AAAA	01	3	3
2023-05-01 10:48:00	BBBB	01	3	3
2023-05-01 10:45:00	AAAA	01	4	2
2023-05-01 10:45:00	AAAA	02	4	2
2023-05-01 10:41:00	CCCC	01	3	2
2023-05-01 10:41:00	CCCC	02	3	1
2023-05-01 10:38:00	BBBB	01	3	3
2023-05-01 10:35:00	AAAA	01	2	2
2023-05-01 10:33:00	CCCC	02	9	2
2023-05-01 10:33:00	CCCC	01	9	5
2023-05-01 10:33:00	CCCC	02	9	2

I've tried experimenting with window functions, but I'm keep getting always count/max values based on prop1/prop2 within whole scope, and not in ranges defined by prop1/prop2 values. It is possible to preorganized data with CTE query before...

SELECT * 
, MAX(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC)
, COUNT(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC )
FROM @dataTable
ORDER BY ts DESC;

I believe the solution is quite simple, but after several tries, and searching the internet, I could not find appropriate example. I believe, RANGE should be used, but I could not get this to work.

Any help/advice would be greatly appreciated.

Solution

On the assumption that having two rows with 10:32:00 is a typo, you have a gaps and islands problem.

The method I prefer is to set a flag by using lag whenever the value in the next row is different, then sum this to provide a running total that represents each island; with this in place you can then count and aggregate the final data.

See this Demo Fiddle

with diff as (
  select *, 
    case when lag(prop1) over(order by ts desc ) = prop1 then 0 else 1 end gp1,
    case when lag(prop2) over(order by ts desc ) = prop2 then 0 else 1 end gp2
  from t
), grp as (
  select *, 
    Sum(gp1) over(order by ts desc) grp1, Sum(gp2) over(order by ts desc) grp2
  from diff
)
select Max_Of_Prop_In_Block, prop1, prop2, 
  grp1Count Count_Of_Prop1_In_Block, 
  grp2Count Count_Of_Prop2_In_Block
from (
  select *, 
    Max(ts) over(partition by grp1) Max_Of_Prop_In_Block, 
    Count(*) over(partition by grp1) grp1Count, 
    Count(*) over(partition by grp1, grp2) grp2Count
  from grp
)t
group by Max_Of_Prop_In_Block, grp1Count, prop1, prop2, grp2Count, grp2
order by Max_Of_Prop_In_Block desc, Max(grp2);