Search code examples
postgresqlgreenplumunique-index

How should I deal with my UNIQUE constraints during my data migration from Postgres9.4 to Greenplum


when I execute the following sql (which is contained by a sql file generated by pg_dump of Postgres9.4) in greenplum:

CREATE TABLE "public"."trm_concept" (
"pid" int8 NOT NULL,
"code" varchar(100)  NOT NULL,
"codesystem_pid" int8,
"display" varchar(400) ,
"index_status" int8,
CONSTRAINT "trm_concept_pkey" PRIMARY KEY ("pid"),
CONSTRAINT "idx_concept_cs_code" UNIQUE ("codesystem_pid", "code")
);

I got this error:

ERROR:  Greenplum Database does not allow having both PRIMARY KEY and UNIQUE constraints

why greenplum doesn't allow this? I really need this unique constraint to guarantee some rule, how can I fix it in greenplum?


Solution

    • a UNIQUE constraint is done with a btree index
    • Primary Keys imply UNIQUE and NOT NULL
    • GreenPlum distributes to child/shards or whatever on whatever you claim as UNIQUE.

    For GreenTree to implement a UNIQUE constraint -- as you want -- that index would have to be

    • copied to every child
    • updated in an ACID compliant manner

    Doing that would totally remove the benefits of running GreenPlum. You may as well move back to PostgreSQL.

    From the docs about CREATE TABLE

    When creating a table, there is an additional clause to declare the Greenplum Database distribution policy. If a DISTRIBUTED BY or DISTRIBUTED RANDOMLY clause is not supplied, then Greenplum assigns a hash distribution policy to the table using either the PRIMARY KEY (if the table has one) or the first column of the table as the distribution key. Columns of geometric or user-defined data types are not eligible as Greenplum distribution key columns. If a table does not have a column of an eligible data type, the rows are distributed based on a round-robin or random distribution. To ensure an even distribution of data in your Greenplum Database system, you want to choose a distribution key that is unique for each record, or if that is not possible, then choose DISTRIBUTED RANDOMLY.

    Same doc says this about PRIMARY KEY,

    For a table to have a primary key, it must be hash distributed (not randomly distributed), and the primary key The column(s) that are unique must contain all the columns of the Greenplum distribution key.

    Here is what the docs on CREATE INDEX

    In Greenplum Database, unique indexes are allowed only if the columns of the index key are the same as (or a superset of) the Greenplum distribution key. On partitioned tables, a unique index is only supported within an individual partition - not across all partitions.