Note: I have a working query, but am looking for optimisations to use it on large tables.
Suppose I have a table like this:
id session_id value
1 5 7
2 5 1
3 5 1
4 5 12
5 5 1
6 5 1
7 5 1
8 6 7
9 6 1
10 6 3
11 6 1
12 7 7
13 8 1
14 8 2
15 8 3
I want the id's of all rows with value 1 with one exception: skip groups with value 1 that directly follow a value 7 within the same session_id.
Basically I would look for groups of value 1 that directly follow a value 7, limited by the session_id, and ignore those groups. I then show all the remaining value 1 rows.
The desired output showing the id's:
5
6
7
11
13
I took some inspiration from this post and ended up with this code:
declare @req_data table (
id int primary key identity,
session_id int,
value int
)
insert into @req_data(session_id, value) values (5, 7)
insert into @req_data(session_id, value) values (5, 1) -- preceded by value 7 in same session, should be ignored
insert into @req_data(session_id, value) values (5, 1) -- ignore this one too
insert into @req_data(session_id, value) values (5, 12)
insert into @req_data(session_id, value) values (5, 1) -- preceded by value != 7, show this
insert into @req_data(session_id, value) values (5, 1) -- show this too
insert into @req_data(session_id, value) values (5, 1) -- show this too
insert into @req_data(session_id, value) values (6, 7)
insert into @req_data(session_id, value) values (6, 1) -- preceded by value 7 in same session, should be ignored
insert into @req_data(session_id, value) values (6, 3)
insert into @req_data(session_id, value) values (6, 1) -- preceded by value != 7, show this
insert into @req_data(session_id, value) values (7, 7)
insert into @req_data(session_id, value) values (8, 1) -- new session_id, show this
insert into @req_data(session_id, value) values (8, 2)
insert into @req_data(session_id, value) values (8, 3)
select id
from (
select session_id, id, max(skip) over (partition by grp) as 'skip'
from (
select tWithGroups.*,
( row_number() over (partition by session_id order by id) - row_number() over (partition by value order by id) ) as grp
from (
select session_id, id, value,
case
when lag(value) over (partition by session_id order by session_id) = 7
then 1
else 0
end as 'skip'
from @req_data
) as tWithGroups
) as tWithSkipField
where tWithSkipField.value = 1
) as tYetAnotherOutput
where skip != 1
order by id
This gives the desired result, but with 4 select blocks I think it's way too inefficient to use on large tables.
Is there a cleaner, faster way to do this?
SELECT CRow.id
FROM @req_data AS CRow
CROSS APPLY (SELECT MAX(id) AS id FROM @req_data PRev WHERE PRev.Id < CRow.id AND PRev.session_id = CRow.session_id AND PRev.value <> 1 ) MaxPRow
LEFT JOIN @req_data AS PRow ON MaxPRow.id = PRow.id
WHERE CRow.value = 1 AND ISNULL(PRow.value,1) <> 7