Search code examples
sqlsql-serverselectaggregate-functions

Counting sequenced records in ranges, based on grouped varchar values


I'm trying to calculate count of same values, in a certain pattern, where the pattern might occur again.

To simply, please see below data, and expected result. Solution is implemented in SQL Server 2016 and above ...

DECLARE @dataTable TABLE
                   (
                       ts DATETIME, 
                       prop1 VARCHAR(4), 
                       prop2 VARCHAR(2)
                   );

INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:51:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:50:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:49:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:48:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:47:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:46:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:45:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:44:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:43:00', 'AAAA', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:32:00', 'AAAA', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:41:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:40:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:39:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:38:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:37:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:36:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:35:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:34:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:33:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:32:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:31:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:30:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:29:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:28:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:27:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:26:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:25:00', 'CCCC', '02');

SELECT 
    *, 
    MAX(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC), 
    COUNT(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC )
FROM 
    @dataTable 
ORDER BY 
    ts DESC;

The blocks are sequenced based on ts column.

The expected result should look like:

MAX_OF_PROP1_IN_BLOCK PROP1 PROP2 COUNT_PROP1_IN_BLOCK COUNT_OF_PROP2_IN_BLOCK
2023-05-01 10:51:00 AAAA 01 3 3
2023-05-01 10:48:00 BBBB 01 3 3
2023-05-01 10:45:00 AAAA 01 4 2
2023-05-01 10:45:00 AAAA 02 4 2
2023-05-01 10:41:00 CCCC 01 3 2
2023-05-01 10:41:00 CCCC 02 3 1
2023-05-01 10:38:00 BBBB 01 3 3
2023-05-01 10:35:00 AAAA 01 2 2
2023-05-01 10:33:00 CCCC 02 9 2
2023-05-01 10:33:00 CCCC 01 9 5
2023-05-01 10:33:00 CCCC 02 9 2

I've tried experimenting with window functions, but I'm keep getting always count/max values based on prop1/prop2 within whole scope, and not in ranges defined by prop1/prop2 values. It is possible to preorganized data with CTE query before...

SELECT * 
, MAX(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC)
, COUNT(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC )
FROM @dataTable
ORDER BY ts DESC;

I believe the solution is quite simple, but after several tries, and searching the internet, I could not find appropriate example. I believe, RANGE should be used, but I could not get this to work.

Any help/advice would be greatly appreciated.


Solution

  • On the assumption that having two rows with 10:32:00 is a typo, you have a gaps and islands problem.

    The method I prefer is to set a flag by using lag whenever the value in the next row is different, then sum this to provide a running total that represents each island; with this in place you can then count and aggregate the final data.

    See this Demo Fiddle

    with diff as (
      select *, 
        case when lag(prop1) over(order by ts desc ) = prop1 then 0 else 1 end gp1,
        case when lag(prop2) over(order by ts desc ) = prop2 then 0 else 1 end gp2
      from t
    ), grp as (
      select *, 
        Sum(gp1) over(order by ts desc) grp1, Sum(gp2) over(order by ts desc) grp2
      from diff
    )
    select Max_Of_Prop_In_Block, prop1, prop2, 
      grp1Count Count_Of_Prop1_In_Block, 
      grp2Count Count_Of_Prop2_In_Block
    from (
      select *, 
        Max(ts) over(partition by grp1) Max_Of_Prop_In_Block, 
        Count(*) over(partition by grp1) grp1Count, 
        Count(*) over(partition by grp1, grp2) grp2Count
      from grp
    )t
    group by Max_Of_Prop_In_Block, grp1Count, prop1, prop2, grp2Count, grp2
    order by Max_Of_Prop_In_Block desc, Max(grp2);