Search code examples
sqljsonpostgresqlrecursionnode-postgres

Recursive Breadth Traversal Query (with max depth), Returning Nested JSON, Filtering the Initial Seed


(I had originally asked this question on the DBA StackExchange, but didn't get much activity in it).

In my DB, I have tables for items and connections between them, creating a likely sparse graph, with islands:

CREATE TABLE IF NOT EXISTS items (
    --------------------------------------------------------
    id    UUID         NOT NULL DEFAULT uuid_generate_v4(),
    --------------------------------------------------------
    title VARCHAR(300) NOT NULL,
    --------------------------------------------------------
    CONSTRAINT     items_pk
       PRIMARY KEY (id)
    --------------------------------------------------------
);

CREATE TABLE IF NOT EXISTS connections (
    ---------------------------------------------------------------------
    id                  UUID         NOT NULL DEFAULT uuid_generate_v4(),
    ---------------------------------------------------------------------
    origin_item_id      UUID         NOT NULL,
    destination_item_id UUID         NOT NULL,
    ---------------------------------------------------------------------
    title               VARCHAR(300) NOT NULL,
    ---------------------------------------------------------------------
    CONSTRAINT     origin_item_fk
       FOREIGN KEY (origin_item_id)
        REFERENCES items (id),
    CONSTRAINT     destination_item_fk
       FOREIGN KEY (destination_item_id)
        REFERENCES items (id),
    CONSTRAINT     connections_pk
       PRIMARY KEY (id)
    ---------------------------------------------------------------------
);

I'm using node-pg, and I would like to do a breadth traversal from an inital seed (e.g. url parameter on ExpressJS), up to a certain depth (from what I see in the docs, the depth part is probably the easiest).

Ideally, I would like things to be returned in a nested way, but, from reading these docs (WITH queries), I couldn't figure out how, or if it is even possible.

INSERT INTO items (title)
VALUES ('Software'), ('Pyongyang'), ('Burma'), ('Shenzhen'), ('Jerusalem'), ('Hostage');

INSERT INTO connections (origin_item_id, destination_item_id, title)
VALUES
    (
        (SELECT id FROM items WHERE title ~ 'Pyongyang'),
        (SELECT id FROM items WHERE title ~ 'Shenzhen'),
        'Same Author - Guy Delisle - 1'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Burma'),
        (SELECT id FROM items WHERE title ~ 'Pyongyang'),
        'Same Author - Guy Delisle - 2'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Jerusalem'),
        (SELECT id FROM items WHERE title ~ 'Shenzhen'),
        'Same Author - Guy Delisle - 3'
    ),
    (
        (SELECT id FROM items WHERE title ~ 'Hostage'),
        (SELECT id FROM items WHERE title ~ 'Jerusalem'),
        'Same Author - Guy Delisle - 4'
    ),

The recursive query would then result in, for Pyongyang and maxDepth = 2 — note neither Hostage nor Software should appear then —:

{
  "title": "Pyongyang",
  "connections": [
    {
      "title": "Same Author - Guy Delisle - 1",
      "destination_item": {
        "title": "Shenzhen",
        "connections": [
          {
            "title": "Same Author - Guy Delisle - 2",
            "origin_item": {
              "title": "Jerusalem"
            }
          }
        ]
      }
    },
    {
      "title": "Same Author - Guy Delisle - 2",
      "origin_item": {
        "title": "Burma"
      }
    }
  ]
}

(I'm not 100% sure if I've covered all the cases with this example.)

Is this type of nesting even possible in PostgreSQL (with, say, jsonb_build_object or jsonb_agg)? How would I do it?

Here is what I've been able to do:

WITH RECURSIVE connections_and_items AS (
    SELECT
         c.id AS connection_id,
         c.title AS connection_title,
         c.origin_item_id,
         i.title AS origin_item_title,
         c.destination_item_id,
        it.title AS destination_item_title

    FROM connections c
    JOIN items       i
      ON  i.id = c.origin_item_id
    JOIN items       it
      ON it.id = c.destination_item_id
),
connected_items AS (
    SELECT *
    FROM connections_and_items
    WHERE origin_item_id = '${itemId}'::UUID

    UNION 
        
    SELECT c.*
    FROM connections_and_items c
    JOIN connected_items       ci
      ON ci.destination_item_id = c.origin_item_id
      OR ci.origin_item_id      = c.destination_item_id
) 

SELECT 
    ci.connection_id, 
    ci.connection_title,
    jsonb_build_object(
        'id',    ci.origin_item_id,
        'title', ci.origin_item_title
    ) origin_item,
    jsonb_build_object(
        'id',    ci.destination_item_id,
        'title', ci.destination_item_title
    ) destination_item

FROM connected_items ci

This actually does traverse the graph successfully, in both directions (!). However, I haven't been able to add a maxDepth constraint (having trouble detecting cycles), and the nesting is still not there.

So, in summary, here's what I have and what I'm missing:

  • ✔ Recursive Breadth Traversal
  • ✗ Max Depth
  • ✗ Nested JSON
  • ✔ Filtering the Initial Seed

References


Solution

  • Very late to the party, but I think the easiest way to solve this is with a recursive function. This can take parameters of the starting node and the maximum depth to search to. This function also has a third input of seen nodes to prevent cycling along any branch of the tree:

    CREATE OR REPLACE FUNCTION build_node(node INT, max_depth INT, seen INT[] default ARRAY[]::INT[])
    RETURNS JSONB
    AS $$
    DECLARE 
      _conns JSONB;
    BEGIN
      IF max_depth = 0 THEN
        RETURN jsonb_build_object(
          'title', (SELECT title FROM items WHERE id = node)
        );
      END IF;
      seen = ARRAY_APPEND(seen, node);
      SELECT jsonb_agg(json_build_object('title', title, 
        CASE WHEN origin_item_id = node THEN 'destination_item' ELSE 'origin_item' END,
        build_node(CASE WHEN origin_item_id = node THEN destination_item_id ELSE origin_item_id END, max_depth - 1, seen)
        )
      ) INTO _conns
      FROM connections
      WHERE origin_item_id = node AND NOT destination_item_id = ANY(seen)
         OR destination_item_id = node AND NOT origin_item_id = ANY(seen);
      IF jsonb_array_length(_conns) > 0 THEN
        RETURN jsonb_build_object(
          'title', (SELECT title FROM items WHERE id = node),
          'connections', _conns
        );
      ELSE
        RETURN jsonb_build_object(
          'title', (SELECT title FROM items WHERE id = node)
        );
      END IF;
    END;
    $$
    LANGUAGE plpgsql
    

    Sample usage:

    SELECT build_node(2, 2)
    

    Output (for your sample data):

    {
        "title": "Pyongyang",
        "connections": [{
            "title": "Same Author - Guy Delisle - 1",
            "destination_item": {
                "title": "Shenzhen",
                "connections": [{
                    "title": "Same Author - Guy Delisle - 3",
                    "origin_item": {
                        "title": "Jerusalem"
                    }
                }]
            }
        }, {
            "title": "Same Author - Guy Delisle - 2",
            "origin_item": {
                "title": "Burma"
            }
        }]
    }
    

    Demo on dbfiddle