Search code examples
postgresqlcopypsql

psql \copy command fails with "violates foreign key constraint" when the foreign row is definitely in the database


I'm having a hard time importing a large data set (~30 million records) into PostGres 14.4 (PostgreSQL 14.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit hosted as a db.m6i.2xlarge AWS RDS instance to be specific).

My first attempt was to stream the file into my app and do INSERT statements of chunks of 1000 rows at a time. This took a very long time (over 24 hours) and ultimately the db connections timed out so the task never completed.

So I thought I should use psql and \copy from the command line.

psql \
  -h foo.bar.us-east-2.rds.amazonaws.com -d mydb -U someuser \
  -c "\copy core.mytable (a,b,c)  from '/path/to/big/no-header.csv' with delimiter as ','"

Sample rows from the CSV:

"apple","browse","https://example.com/p1"
"apple","browse","https://example.com/p2"
"apple","browse","https://example.com/p3"

I worked through a couple errors with how I had quoted values in the file, but now I'm getting errors about foreign key constraints:

ERROR:  insert or update on table "mytable" violates foreign key constraint "mytable_a_fkey"
DETAIL:  Key (a)=("apple") is not present in table "alphas".

I can confirm that the foreign key value listed in the error is very much present in the database. I can do the following to see that yes, my alphas table is there and yes, the value referenced in the error is present:

SET search_path TO 'core';
\dt
SELECT * FROM alphas;

    id     |     created_at      |     updated_at
-----------+---------------------+---------------------
 apple     | 2023-02-10 22: 25: 41 | 2023-02-10 22: 25: 41
 banana    | 2023-02-10 22: 25: 41 | 2023-02-10 22: 25: 41
 cherry    | 2023-02-10 22: 25: 41 | 2023-02-10 22: 25: 41

To belabor the point, I can freely insert the same data from the CSV in a regular PostGres query and this works without problems as expected, e.g.

INSERT INTO core.mytable (a, b, c) VALUES ('apple','browse','https://example.com/p1');

Additional info:

show search_path;
   search_path
-----------------
 "$user", public
(1 row)

=> SET search_path TO 'core';
=> show search_path;
 search_path
-------------
 core
(1 row)


=> \dt *.alphas
        List of relations
 Schema | Name | Type  |  Owner
--------+------+-------+----------
 core   | alphas | table | postgres
(1 row)

=> \d alphas
                               Table "core.alphas"
    Column    |              Type              | Collation | Nullable | Default
--------------+--------------------------------+-----------+----------+---------
 id           | character varying(32)          |           | not null |
 created_at   | timestamp(0) without time zone |           | not null |
 updated_at   | timestamp(0) without time zone |           | not null |
Indexes:
    "alphas_pkey" PRIMARY KEY, btree (id)
    "alphas_lower_id_index" UNIQUE, btree (lower(id::text))
Referenced by:
REFERENCES alphas(id) ON DELETE CASCADE
    TABLE "mytable" CONSTRAINT "mytable_a_fkey" FOREIGN KEY (a)

This has been super frustrating, but I figure I must be missing something. I have double-checked that I'm logging into the correct database instance and I'm about out of ideas of what else to check. Where is this foreign key error coming from? Can anyone shed light on this confounding problem?


Solution

  • Your error msg says:

    ERROR:  insert or update on table "mytable" violates foreign key constraint "mytable_a_fkey"   
    DETAIL:  Key (a)=("apple") is not present in table "alphas".
    

    Notably, (a)=("apple"), with double-quotes. If you would write apple (without quotes) to the column and that key would be missing, Postgres would complain about (a)=(apple).

    Fix your quotes or, rather, switch to CSV format in your COPY command. Your sample very much looks like CSV format, and the file name indicates as much.

    \copy core.mytable(a,b,c) FROM '/path/to/big/no-header.csv' WITH (FORMAT csv);
    

    The manual:

    QUOTE

    Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using CSV format.

    You can drop the DELIMITER clause now, since:

    DELIMITER

    Specifies the character that separates columns within each row (line) of the file. The default is a tab character in text format, a comma in CSV format. [...]