Search code examples
databasepostgresqlpostgres-16

How to avoid circular relations in postgres using cycle detection


I'm trying to use CYCLE detection in PostgreSQL to avoid circular nesting of groups but cannot figure it out.

I have the following PostgreSQL 16 table:

CREATE TABLE groups (
  id serial PRIMARY KEY,
  name text NOT NULL
);

INSERT INTO groups (name)
VALUES
  ('group1'),
  ('group2'),
  ('group3'),
  ('group4');

SELECT * FROM groups;

Output:

id name
1 group1
2 group2
3 group3
4 group4

And the following junction table:

CREATE TABLE related_groups_self (
  parent_id integer REFERENCES groups (id), 
  child_id integer REFERENCES groups (id),
  PRIMARY KEY (parent_id, child_id)
);

INSERT INTO related_groups_self (parent_id, child_id) 
VALUES
    (1, 2),
    (2, 3),
    (3, 4);

SELECT * FROM related_groups_self

Output:

parent_id child_id
1 2
2 3
3 4

If my app user was to attempt to add group 4 as the parent of group 1; (4, 1), how can I detect this using CYCLE detection?

I want to avoid inserting the tuple until after the check has verified as safe from circular nesting, so the incorrect entry won't exist in the junction table already.

I'm no DBA, and I've been banging my head against this issue using multiple different methods from various blog posts on the subject with very little success. Here's the latest effort, although obviously not working:

WITH RECURSIVE tree AS (
    SELECT child_id, 1 AS level
    FROM related_groups_self rgs
    WHERE rgs.parent_id = 1
  UNION
    SELECT rgs.child_id, tree.level + 1
    FROM tree
    JOIN related_groups_self as rgs ON(tree.child_id = rgs.parent_id)
) CYCLE child_id SET is_cycle USING cycle_path
SELECT * FROM tree;

Any help would be appreciated


Solution

  • If you want to prevent cycles you don't have to use the CYCLE keyword, since you don't have cycle to begin with.

    Instead if you want to add the connection A->B you must ensure that no path from B->A exists.

    To do this you can start with a select giving you all paths:

    with recursive tree(parent) as (
        select r.parent_id, r.child_id, 0 level
        from related_groups_self as r
      UNION ALL
        select r.parent_id, r.child_id, t.level + 1 level
        from related_groups_self as r, tree t
        where r.parent_id = t.child_id
      
    )
    select * from tree
    

    Now limit it to those connecting the end points of the potential new connection:

    
    with recursive tree(parent) as (
        select r.parent_id, r.child_id, 0 level
        from related_groups_self as r
        where r.parent_id = 1 -- new child id
      UNION ALL
        select r.parent_id, r.child_id, t.level + 1 level
        from related_groups_self as r, tree t
        where r.parent_id = t.child_id
      
    )
    select * from tree
    where child_id = 4 -- new parent id
    

    Note that we can limit the parent_id on the inside of the recursive select, because the path has to start there. But we have to limit the child_id on the outside because intermediate steps might go through other child_id values.

    We can further tune this by limiting the result to 1 row or by using an exists construct. What exactly to use depends on where you want to use this:

    with recursive tree(parent) as (
        select r.parent_id, r.child_id, 0 level
        from related_groups_self as r
        where r.parent_id = 1 -- new child id
      UNION ALL
        select r.parent_id, r.child_id, t.level + 1 level
        from related_groups_self as r, tree t
        where r.parent_id = t.child_id
      
    )
    select * from tree
    where child_id = 4 -- new parent id
    limit 1
    

    DB Fiddle

    When applying this test you must ensure that no inserts or updates happen between your check and the insert, which probably means a table lock.

    Alternatively you could allow cycles to exist for a short time and run checks on a regular basis, possibly after every table update in order to eliminate them.