the goal is to depulicate the IDs in P1ID and P2ID.
I'm working on SQL Server 2019.
CREATE TABLE InitialResultSet (
ID INT,
ID_GROUP VARCHAR(10),
P1ID BIGINT,
P2ID BIGINT,
P1_RANK VARCHAR(10),
P2_RANK VARCHAR(10)
);
INSERT INTO InitialResultSet (ID, ID_GROUP , P1ID, P2ID, P1_RANK , P2_RANK ) VALUES
(1, 'A', 1, 2, '0001', '0002'),
(2, 'A', 2, 3, '0002', '0005'),
(3, 'B', 4, 5, '0002', '0026'),
(4, 'B', 6, 8, '0017', '0024'),
(5, 'B', 6, 9, '0017', '0043'),
(6, 'B', 6, 10, '0017', '0044'),
(7, 'B', 8, 7, '0024', '0098'),
(8, 'B', 9, 7, '0043', '0098'),
(9, 'B', 10, 7, '0044', '0098');
CREATE TABLE DesiredOutcome (
ID INT,
ID_GROUP VARCHAR(10),
P1ID BIGINT,
P2ID BIGINT,
P1_RANK VARCHAR(10),
P2_RANK VARCHAR(10)
);
INSERT INTO DesiredOutcome (ID, ID_GROUP , P1ID, P2ID, P1_RANK, P2_RANK) VALUES
(1, 'A', 1, 2, '0001', '0002'),
(3, 'B', 4, 5, '0002', '0026'),
(4, 'B', 6, 8, '0017', '0024'),
(8, 'B', 9, 7, '0043', '0098');
InitialResultSet
ID | ID_GROUP | P1ID | P2ID | P1_RANK | P2_RANK |
---|---|---|---|---|---|
1 | A | 1 | 2 | 0001 | 0002 |
2 | A | 2 | 3 | 0002 | 0005 |
3 | B | 4 | 5 | 0002 | 0026 |
4 | B | 6 | 8 | 0017 | 0024 |
5 | B | 6 | 9 | 0017 | 0043 |
6 | B | 6 | 10 | 0017 | 0044 |
7 | B | 8 | 7 | 0024 | 0098 |
8 | B | 9 | 7 | 0043 | 0098 |
9 | B | 10 | 7 | 0044 | 0098 |
DesiredOutcome
ID | ID_GROUP | P1ID | P2ID | P1_RANK | P2_RANK |
---|---|---|---|---|---|
1 | A | 1 | 2 | 0001 | 0002 |
3 | B | 4 | 5 | 0002 | 0026 |
4 | B | 6 | 8 | 0017 | 0024 |
8 | B | 9 | 7 | 0043 | 0098 |
Here's the way to do it recursively. This follows each ID_GROUP
in order and determines which nodes are kept. It generates and uses a text string of ids to match for the ones that have already been retained:
with dataX as (
select *,
row_number() over (partition by ID_GROUP order by ID) as gn,
cast(concat('|', P1ID, '|', P2ID, '|') as varchar(max)) as payload
from InitialResultSet
), chain as (
select ID, ID_GROUP, gn, payload as allocated, 'Y' as keep
from dataX where gn = 1
union all
select d.ID, d.ID_GROUP, d.gn,
concat(c.allocated,
case when allocated not like concat('%|', P1ID, '|%')
and allocated not like concat('%|', P2ID, '|%') then payload end),
case when allocated not like concat('%|', P1ID, '%')
and allocated not like concat('%|', P2ID, '|%') then 'Y' else 'N' end
from chain c inner join dataX d
on d.ID_GROUP = c.ID_GROUP and d.gn = c.gn + 1
)
select * from InitialResultSet
where ID in (select ID from chain where keep = 'Y');
Based on OP's description of ID
it would be possible to use that value instead of gn
in order to follow the nodes in sequence. (You might possibly get a better plan by explicitly sorting on the same columns too.)
Using gn
does have the advantage in terms of recursion depth though. This query will never need more iterations than the size of the largest ID_GROUP
whereas using ID
will only add a single row on each pass and will involve significantly longer strings and corresponding scans. As a side benefit, this would also work under a different scenario where id values can be repeated across different ID_GROUPS
.