I'm loading ~45 million records from a CSV file with \COPY
. The data contains x, y, z values representing 3D coordinates.
CREATE TABLE mytable (
id integer,
x double precision,
y double precision,
z double precision,
coordinate geometry default NULL,
CONSTRAINT mytable_pkey PRIMARY KEY (id)
);
The geometry
type is coming from postgis. I'd like to create the coordinate
value on the fly with ST_MakePoint(x, y, z)
while importing the data. Currently, I'm using a trigger for this purpose, but it significantly slows down the import.
CREATE OR REPLACE FUNCTION create_coordinates()
RETURNS trigger AS
$$
BEGIN
update mytable set coordinate = ST_MakePoint(NEW.x, NEW.y, NEW.z) where id = NEW.id;
RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';
CREATE TRIGGER create_coordinates_trigger
AFTER INSERT
ON mytable
FOR EACH ROW
EXECUTE PROCEDURE create_coordinates();
Is there a better aproach for creating custom column values while importing data?
Starting in v12, you can use a generated column.
CREATE TABLE mytable2 (
id integer,
x double precision,
y double precision,
z double precision,
coordinate geometry generated always as (ST_MakePoint(x, y, z)) stored,
CONSTRAINT mytable2_pkey PRIMARY KEY (id)
);
In my hands, this seems to be about 6 times faster to bulk load than the trigger version, (even more for very larger imports)
And if you don't want to us this newish feature, just turning the AFTER trigger into a BEFORE trigger is also several times faster. Rather than doing an update after the fact, you just alter the row before ever inserting it:
NEW.coordinate = ST_MakePoint(NEW.x, NEW.y, NEW.z);