I have hierarchy data where each record belongs to a parent group, and need to retreive all records plus their parent record, but do not wish to duplicate a record if it already was included in the initial non-recursive term.
For example, the following query returns the first six records where provided by the non-recursive term, and then the recursive term returns the next five records for the parents, however three of the parent records (child_id 4528, 4539 and 4541) already were returned by the initial non-recursive term and should not be duplicated.
I tried using DISTINCT, adding cs.id!=t.child_id
(no effect?) to the WHERE clause, using t
in a subquery (evidently not possible) with no success.
How can this query be modified to not return duplicated rows?
facdocs=> WITH RECURSIVE t(id, child_id, parent_id) AS (
SELECT s.id, cs.id child_id, cs.parent_id, cs.name, cs.spec
FROM project p
INNER JOIN specification s ON s.project_id=p.id
INNER JOIN csi_spec cs ON cs.id=s.csi_spec_id
WHERE p.id = 1
UNION
SELECT null id, cs.id child_id, cs.parent_id, cs.name, cs.spec
FROM t, csi_spec cs
WHERE cs.id=t.parent_id AND cs.id!=t.child_id AND cs.id
)
SELECT * FROM t;
id | child_id | parent_id | name | spec
----+----------+-----------+----------------------------------------------------+-------------
7 | 4543 | 4541 | Medium-Voltage, Single- and Multi-Conductor Cables | 26 05 13.16
8 | 4528 | 2 | Electrical | 26 00 00
9 | 4539 | 4528 | Common Work Results for Electrical | 26 05 00
11 | 4541 | 4539 | Medium-Voltage Cables | 26 05 13
12 | 4542 | 4541 | Medium-Voltage Open Conductors | 26 05 13.13
13 | 4578 | 4573 | Wiring Device Schedule | 26 06 20.26
| 4528 | 2 | Electrical | 26 00 00
| 4539 | 4528 | Common Work Results for Electrical | 26 05 00
| 4541 | 4539 | Medium-Voltage Cables | 26 05 13
| 4573 | 4571 | Schedules for Low-Voltage Electrical Distribution | 26 06 20
| 4571 | 4528 | Schedules for Electrical | 26 06 00
(11 rows)
For refence only, below shows the tables:
facdocs=> \d csi_spec
Table "public.csi_spec"
Column | Type | Collation | Nullable | Default
-----------+------------------------+-----------+----------+---------
id | integer | | not null |
parent_id | integer | | not null |
name | character varying(255) | | not null |
spec | character varying(255) | | not null |
div | character varying(2) | | not null |
section | character varying(2) | | not null |
scope | character varying(2) | | not null |
subscope | character varying(2) | | not null |
Indexes:
"csi_spec_pkey" PRIMARY KEY, btree (id)
"idx_unique_spec" UNIQUE, btree (div, section, scope, subscope)
"uniq_92347c2dc00e173e" UNIQUE, btree (spec)
"idx_92347c2d727aca70" btree (parent_id)
"idx_div" btree (div)
"idx_scope" btree (subscope)
"idx_section" btree (section)
Foreign-key constraints:
"fk_92347c2d727aca70" FOREIGN KEY (parent_id) REFERENCES csi_spec(id)
Referenced by:
TABLE "csi_spec" CONSTRAINT "fk_92347c2d727aca70" FOREIGN KEY (parent_id) REFERENCES csi_spec(id)
TABLE "specification" CONSTRAINT "fk_e3f1a9af4ead25e" FOREIGN KEY (csi_spec_id) REFERENCES csi_spec(id)
facdocs=> \d specification
Table "public.specification"
Column | Type | Collation | Nullable | Default
-------------+--------------------------------+-----------+----------+---------
id | integer | | not null |
project_id | integer | | not null |
owner_id | integer | | not null |
csi_spec_id | integer | | not null |
create_at | timestamp(0) without time zone | | not null |
notes | text | | |
Indexes:
"specification_pkey" PRIMARY KEY, btree (id)
"idx_e3f1a9a166d1f9c" btree (project_id)
"idx_e3f1a9a7e3c61f9" btree (owner_id)
"idx_e3f1a9af4ead25e" btree (csi_spec_id)
Foreign-key constraints:
"fk_e3f1a9a166d1f9c" FOREIGN KEY (project_id) REFERENCES project(id)
"fk_e3f1a9a7e3c61f9" FOREIGN KEY (owner_id) REFERENCES "user"(id)
"fk_e3f1a9af4ead25e" FOREIGN KEY (csi_spec_id) REFERENCES csi_spec(id)
Referenced by:
TABLE "document" CONSTRAINT "fk_d8698a76908e2ffe" FOREIGN KEY (specification_id) REFERENCES specification(id)
facdocs=> \d project
Table "public.project"
Column | Type | Collation | Nullable | Default
-------------+--------------------------------+-----------+----------+-----------------------------------
id | integer | | not null |
account_id | integer | | not null |
name | character varying(255) | | not null |
project_id | character varying(255) | | not null |
create_at | timestamp(0) without time zone | | not null |
is_active | boolean | | not null |
start_at | timestamp(0) without time zone | | | NULL::timestamp without time zone
description | text | | |
Indexes:
"project_pkey" PRIMARY KEY, btree (id)
"name_unique" UNIQUE, btree (name, account_id)
"project_id_unique" UNIQUE, btree (project_id, account_id)
"idx_2fb3d0ee9b6b5fba" btree (account_id)
Foreign-key constraints:
"fk_2fb3d0ee9b6b5fba" FOREIGN KEY (account_id) REFERENCES account(id)
Referenced by:
TABLE "project_vendor" CONSTRAINT "fk_e286d8bc166d1f9c" FOREIGN KEY (project_id) REFERENCES project(id) ON DELETE CASCADE
TABLE "specification" CONSTRAINT "fk_e3f1a9a166d1f9c" FOREIGN KEY (project_id) REFERENCES project(id)
facdocs=>
A simple fix is to exclude the duplicates in the main query:
WITH RECURSIVE t (...)
SELECT DISTINCT ON (child_id) *
FROM t
ORDER BY child_id, id;
This should work, because DISTINCT ON
will return the first row per client_id
and NULL sorts at the end by default.