I'm having a hard time importing a large data set (~30 million records) into PostGres 14.4 (PostgreSQL 14.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit hosted as a db.m6i.2xlarge
AWS RDS instance to be specific).
My first attempt was to stream the file into my app and do INSERT
statements of chunks of 1000 rows at a time. This took a very long time (over 24 hours) and ultimately the db connections timed out so the task never completed.
So I thought I should use psql
and \copy
from the command line.
psql \
-h foo.bar.us-east-2.rds.amazonaws.com -d mydb -U someuser \
-c "\copy core.mytable (a,b,c) from '/path/to/big/no-header.csv' with delimiter as ','"
Sample rows from the CSV:
"apple","browse","https://example.com/p1"
"apple","browse","https://example.com/p2"
"apple","browse","https://example.com/p3"
I worked through a couple errors with how I had quoted values in the file, but now I'm getting errors about foreign key constraints:
ERROR: insert or update on table "mytable" violates foreign key constraint "mytable_a_fkey"
DETAIL: Key (a)=("apple") is not present in table "alphas".
I can confirm that the foreign key value listed in the error is very much present in the database. I can do the following to see that yes, my alphas
table is there and yes, the value referenced in the error is present:
SET search_path TO 'core';
\dt
SELECT * FROM alphas;
id | created_at | updated_at
-----------+---------------------+---------------------
apple | 2023-02-10 22: 25: 41 | 2023-02-10 22: 25: 41
banana | 2023-02-10 22: 25: 41 | 2023-02-10 22: 25: 41
cherry | 2023-02-10 22: 25: 41 | 2023-02-10 22: 25: 41
To belabor the point, I can freely insert the same data from the CSV in a regular PostGres query and this works without problems as expected, e.g.
INSERT INTO core.mytable (a, b, c) VALUES ('apple','browse','https://example.com/p1');
Additional info:
show search_path;
search_path
-----------------
"$user", public
(1 row)
=> SET search_path TO 'core';
=> show search_path;
search_path
-------------
core
(1 row)
=> \dt *.alphas
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
core | alphas | table | postgres
(1 row)
=> \d alphas
Table "core.alphas"
Column | Type | Collation | Nullable | Default
--------------+--------------------------------+-----------+----------+---------
id | character varying(32) | | not null |
created_at | timestamp(0) without time zone | | not null |
updated_at | timestamp(0) without time zone | | not null |
Indexes:
"alphas_pkey" PRIMARY KEY, btree (id)
"alphas_lower_id_index" UNIQUE, btree (lower(id::text))
Referenced by:
REFERENCES alphas(id) ON DELETE CASCADE
TABLE "mytable" CONSTRAINT "mytable_a_fkey" FOREIGN KEY (a)
This has been super frustrating, but I figure I must be missing something. I have double-checked that I'm logging into the correct database instance and I'm about out of ideas of what else to check. Where is this foreign key error coming from? Can anyone shed light on this confounding problem?
Your error msg says:
ERROR: insert or update on table "mytable" violates foreign key constraint "mytable_a_fkey" DETAIL: Key (a)=("apple") is not present in table "alphas".
Notably, (a)=("apple")
, with double-quotes. If you would write apple
(without quotes) to the column and that key would be missing, Postgres would complain about (a)=(apple)
.
Fix your quotes or, rather, switch to CSV format in your COPY
command. Your sample very much looks like CSV format, and the file name indicates as much.
\copy core.mytable(a,b,c) FROM '/path/to/big/no-header.csv' WITH (FORMAT csv);
QUOTE
Specifies the quoting character to be used when a data value is quoted. The default is double-quote. This must be a single one-byte character. This option is allowed only when using
CSV
format.
You can drop the DELIMITER
clause now, since:
DELIMITER
Specifies the character that separates columns within each row (line) of the file. The default is a tab character in text format, a comma in
CSV
format. [...]