Search code examples
sqlsql-servert-sqlsql-server-2014

skip consecutive rows after specific value


Note: I have a working query, but am looking for optimisations to use it on large tables.

Suppose I have a table like this:

id  session_id  value
1       5           7
2       5           1
3       5           1
4       5           12
5       5           1
6       5           1
7       5           1
8       6           7
9       6           1
10      6           3
11      6           1
12      7           7
13      8           1
14      8           2
15      8           3

I want the id's of all rows with value 1 with one exception: skip groups with value 1 that directly follow a value 7 within the same session_id.

Basically I would look for groups of value 1 that directly follow a value 7, limited by the session_id, and ignore those groups. I then show all the remaining value 1 rows.

The desired output showing the id's:

5
6
7
11
13

I took some inspiration from this post and ended up with this code:

declare @req_data table (
    id int primary key identity,
    session_id int,
    value int
)

insert into @req_data(session_id, value) values (5, 7)
insert into @req_data(session_id, value) values (5, 1)  -- preceded by value 7 in same session, should be ignored
insert into @req_data(session_id, value) values (5, 1)  -- ignore this one too
insert into @req_data(session_id, value) values (5, 12)
insert into @req_data(session_id, value) values (5, 1)  -- preceded by value != 7, show this
insert into @req_data(session_id, value) values (5, 1)  -- show this too
insert into @req_data(session_id, value) values (5, 1)  -- show this too
insert into @req_data(session_id, value) values (6, 7)
insert into @req_data(session_id, value) values (6, 1)  -- preceded by value 7 in same session, should be ignored
insert into @req_data(session_id, value) values (6, 3)
insert into @req_data(session_id, value) values (6, 1)  -- preceded by value != 7, show this
insert into @req_data(session_id, value) values (7, 7)
insert into @req_data(session_id, value) values (8, 1)  -- new session_id, show this
insert into @req_data(session_id, value) values (8, 2)
insert into @req_data(session_id, value) values (8, 3)



select id
from (
    select session_id, id, max(skip) over (partition by grp) as 'skip'
    from (
        select tWithGroups.*,
            ( row_number() over (partition by session_id order by id) - row_number() over (partition by value order by id) ) as grp
        from (
            select session_id, id, value,
                case
                    when lag(value) over (partition by session_id order by session_id) = 7
                        then 1
                    else 0
                end as 'skip'
            from @req_data
        ) as  tWithGroups
    ) as tWithSkipField
    where tWithSkipField.value = 1
) as tYetAnotherOutput
where skip != 1
order by id

This gives the desired result, but with 4 select blocks I think it's way too inefficient to use on large tables.

Is there a cleaner, faster way to do this?


Solution

  • SELECT CRow.id
    FROM @req_data AS CRow
    CROSS APPLY (SELECT MAX(id) AS id FROM @req_data PRev WHERE PRev.Id < CRow.id  AND PRev.session_id = CRow.session_id AND  PRev.value <> 1 ) MaxPRow
    LEFT JOIN @req_data AS PRow ON MaxPRow.id = PRow.id
    WHERE CRow.value = 1 AND ISNULL(PRow.value,1) <> 7