I'm trying to calculate count of same values, in a certain pattern, where the pattern might occur again.
To simply, please see below data, and expected result. Solution is implemented in SQL Server 2016 and above ...
DECLARE @dataTable TABLE
(
ts DATETIME,
prop1 VARCHAR(4),
prop2 VARCHAR(2)
);
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:51:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:50:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:49:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:48:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:47:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:46:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:45:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:44:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:43:00', 'AAAA', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:32:00', 'AAAA', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:41:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:40:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:39:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:38:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:37:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:36:00', 'BBBB', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:35:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:34:00', 'AAAA', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:33:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:32:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:31:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:30:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:29:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:28:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:27:00', 'CCCC', '01');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:26:00', 'CCCC', '02');
INSERT INTO @dataTable (ts, prop1, prop2) VALUES ('2023-05-01 10:25:00', 'CCCC', '02');
SELECT
*,
MAX(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC),
COUNT(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC )
FROM
@dataTable
ORDER BY
ts DESC;
The blocks are sequenced based on ts column.
The expected result should look like:
MAX_OF_PROP1_IN_BLOCK | PROP1 | PROP2 | COUNT_PROP1_IN_BLOCK | COUNT_OF_PROP2_IN_BLOCK |
---|---|---|---|---|
2023-05-01 10:51:00 | AAAA | 01 | 3 | 3 |
2023-05-01 10:48:00 | BBBB | 01 | 3 | 3 |
2023-05-01 10:45:00 | AAAA | 01 | 4 | 2 |
2023-05-01 10:45:00 | AAAA | 02 | 4 | 2 |
2023-05-01 10:41:00 | CCCC | 01 | 3 | 2 |
2023-05-01 10:41:00 | CCCC | 02 | 3 | 1 |
2023-05-01 10:38:00 | BBBB | 01 | 3 | 3 |
2023-05-01 10:35:00 | AAAA | 01 | 2 | 2 |
2023-05-01 10:33:00 | CCCC | 02 | 9 | 2 |
2023-05-01 10:33:00 | CCCC | 01 | 9 | 5 |
2023-05-01 10:33:00 | CCCC | 02 | 9 | 2 |
I've tried experimenting with window functions, but I'm keep getting always count/max values based on prop1/prop2 within whole scope, and not in ranges defined by prop1/prop2 values. It is possible to preorganized data with CTE query before...
SELECT *
, MAX(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC)
, COUNT(ts) OVER (PARTITION BY prop1 ORDER BY ts DESC )
FROM @dataTable
ORDER BY ts DESC;
I believe the solution is quite simple, but after several tries, and searching the internet, I could not find appropriate example. I believe, RANGE should be used, but I could not get this to work.
Any help/advice would be greatly appreciated.
On the assumption that having two rows with 10:32:00
is a typo, you have a gaps and islands problem.
The method I prefer is to set a flag by using lag
whenever the value in the next row is different, then sum this to provide a running total that represents each island; with this in place you can then count and aggregate the final data.
See this Demo Fiddle
with diff as (
select *,
case when lag(prop1) over(order by ts desc ) = prop1 then 0 else 1 end gp1,
case when lag(prop2) over(order by ts desc ) = prop2 then 0 else 1 end gp2
from t
), grp as (
select *,
Sum(gp1) over(order by ts desc) grp1, Sum(gp2) over(order by ts desc) grp2
from diff
)
select Max_Of_Prop_In_Block, prop1, prop2,
grp1Count Count_Of_Prop1_In_Block,
grp2Count Count_Of_Prop2_In_Block
from (
select *,
Max(ts) over(partition by grp1) Max_Of_Prop_In_Block,
Count(*) over(partition by grp1) grp1Count,
Count(*) over(partition by grp1, grp2) grp2Count
from grp
)t
group by Max_Of_Prop_In_Block, grp1Count, prop1, prop2, grp2Count, grp2
order by Max_Of_Prop_In_Block desc, Max(grp2);