Search code examples
postgresqlpostgresql-15

Get all rows where two ids appear the first time



i need help with a problem that drives me nuts.
Imagine you have the following table with just three ids.
CREATE TABLE candidates
(
    row_num BIGINT,
    in_id  BIGINT,
    out_id BIGINT
);

Fill it with some data:

INSERT INTO candidates (row_num, in_id, out_id)
VALUES (1, 220, 210),
       (2, 208, 210),
       (3, 216, 210),
       (4, 220, 214),
       (5, 208, 214),
       (6, 216, 214),
       (7, 212, 218);

What i now want is to select all rows in a way that each in_id and out_id is unique across all selected rows.
So we'd select the first row, because we didn't see either the in_id 220 nor the out_id 210 before. The pair (220,210) is selected and therefore no following row is allowed to have the in_id 220 or the out_id 210. That's why we need to skip the next 3 rows (they either have in_id 220 or out_id 210). Now we end up at row_num 5. Neither 208 nor 214 where selected before. Thats why it also should be part of the final result set. Same goes for row_num 7.

The final result set should look like:

row_num | in_id | out_id
   1       220      210
   5       208      214
   7       212      218

Seems so simple but i can't solve it. Cleanest try is the following query which has the problem that despite that it knows which ids appeared before it can't detect if a row was selected or not. Breaks for the case of row_num 5.

SELECT *
FROM candidates tc1
WHERE NOT EXISTS (
  SELECT
  FROM candidates tc2
  WHERE (tc2.in_id = tc1.in_id OR tc2.out_id = tc1.out_id)
    AND tc2.row_num < tc1.row_num
);

Any help greatly appreciated, thanks!


Solution

  • This can be done using WITH recursive :

    with recursive cte as (
      select row_num, in_id, out_id, 1 as exist
      from candidates
      where row_num = 1
      union all
      select t.row_num, t.in_id, t.out_id, exist+1
      from cte c
      inner join candidates t on t.row_num > c.row_num
      and (t.in_id <> c.in_id and t.out_id <> c.out_id)
    )
    select t.row_num, t.in_id, t.out_id
    from candidates t
    inner join (
      select min(row_num) as row_num, exist
      from cte
      group by exist
    ) as s on s.row_num = t.row_num
    

    The recursive CTE used to get records recursively by joining the previous records with the current table where current row bigger than already found records and current in_id and out_id not exist in previous records.

    Demo here