Search code examples
postgresqlshardingcitus

Citus: How can I add self referencing table in distributed tables list


I'm trying to run create_distributed_table for tables which i need to shard and almost all of the tables have self relation ( parent child ) but when I run SELECT create_distributed_table('table-name','id'); it throws error cannot create foreign key constraint

simple steps to reproduce

CREATE TABLE TEST (
  ID        TEXT                 NOT NULL,
  NAME      CHARACTER VARYING(255) NOT NULL,
  PARENT_ID TEXT
);

ALTER TABLE TEST ADD CONSTRAINT TEST_PK PRIMARY KEY (ID);

ALTER TABLE TEST  ADD CONSTRAINT TEST_PARENT_FK FOREIGN KEY (PARENT_ID) REFERENCES TEST (ID);

ERROR

citus=> SELECT create_distributed_table('test','id');
ERROR:  cannot create foreign key constraint
DETAIL:  Foreign keys are supported in two cases, either in between two colocated tables including partition column in the same ordinal in the both tables or from distributed to reference tables

Solution

  • For the time being, it is not possible to shard a table on PostgreSQL without dropping the self referencing foreign key constraints, or altering them to include a separate and new distribution column.

    Citus places records into shards based on the hash values of the distribution column values. It is most likely the case that the hashes of parent and child id values are different and hence the records should be stored in different shards, and possibly on different worker nodes. PostgreSQL does not have a mechanism to create foreign key constraints that reference records on different PostgreSQL clusters.

    Consider adding a new column tenant_id and adding this column to the primary key and foreign key constraints.

    CREATE TABLE TEST (
      tenant_id INT                    NOT NULL,
      id        TEXT                   NOT NULL,
      name      CHARACTER VARYING(255) NOT NULL,
      parent_id TEXT                   NOT NULL,
    
      FOREIGN KEY (tenant_id, parent_id) REFERENCES test(tenant_id, id),
      PRIMARY KEY (tenant_id, id)
    );
    
    SELECT create_distributed_table('test','tenant_id');
    

    Note that parent and child should always be in the same tenant for this to work.