Search code examples
postgresqlsql-insertcommon-table-expressionsql-returning

Insert into two referencing tables by selecting from a single table


I have 2 permanent tables in my PostgreSQL 12 database with a one-to-many relationship (thing, and thing_identifier). The second -- thing_identifier -- has a column referencing thing, such that thing_identifier can hold multiple, external identifiers for a given thing:

CREATE TABLE IF NOT EXISTS thing
(
    thing_id SERIAL PRIMARY KEY,
    thing_name TEXT, --this is not necessarily unique
    thing_attribute TEXT --also not unique 
);

CREATE TABLE IF NOT EXISTS thing_identifier
(
    id SERIAL PRIMARY KEY,
    thing_id integer references thing (thing_id),
    identifier text
);

I need to insert some new data into thing and thing_identifier, both of which come from a table I created by using COPY to pull the contents of a large CSV file into the database, something like:

CREATE TABLE IF NOT EXISTS things_to_add
(
    id SERIAL PRIMARY KEY,
    guid TEXT, --a unique identifier used by the supplier
    thing_name TEXT, --not unique
    thing_attribute TEXT --also not unique

);

Sample data:

INSERT INTO things_to_add (guid, thing_name) VALUES 
  ('[111-22-ABC]','Thing-a-ma-jig','pretty thing'),
  ('[999-88-XYZ]','Herk-a-ma-fob','blue thing');

The goal is to have each row in things_to_add result in one new row, each, in thing and thing_identifier, as in the following:

thing:

| thing_id | thing_name          |  thing attribute  |
|----------|---------------------|-------------------|
|     1    | thing-a-ma-jig      |  pretty thing
|     2    | herk-a-ma-fob       |  blue thing

thing_identifier:

| id | thing_id | identifier       |
|----|----------|------------------|
|  8 |     1    | '[111-22-ABC]'   |
|  9 |     2    | '[999-88-XYZ]'   |

I could use a CTE INSERTstatement (with RETURNING thing_id) to get the thing_id that results from the INSERT on thing, but I can't figure out how to get both that thing_id from the INSERT on thing and the original guid from things_to_add, which needs to go into thing_identifier.identifier.

Just to be clear, the only guaranteed unique column in thing is thing_id, and the only guaranteed unique column in things_to_add is id (which we don't want to store) and guid (which is what we want in thing_identifier.identifier), so there isn't any way to join thing and things_to_add after the INSERT on thing.


Solution

  • You can retrieve the thing_to_add.guid from a JOIN :

    WITH list AS
    (
      INSERT INTO thing (thing_name)
      SELECT thing_name
        FROM things_to_add
      RETURNING thing_id, thing_name
    )
    INSERT INTO thing_identifier (thing_id, identifier)
    SELECT l.thing_id, t.guid
      FROM list AS l
     INNER JOIN thing_to_add AS t
        ON l.thing_name = t.thing_name
    

    Then, if thing.thing_name is not unique, the problem is more tricky. Updating both tables thing and thing_identifier from the same trigger on thing_to_add may solve the issue :

    CREATE OR REPLACE FUNCTION after_insert_thing_to_add ()
    RETURNS TRIGGER LANGUAGE sql AS
    $$
    WITH list AS
    (
      INSERT INTO thing (thing_name)
      SELECT NEW.thing_name
      RETURNING thing_id
    )
    INSERT INTO thing_identifier (thing_id, identifier)
    SELECT l.thing_id, NEW.guid
      FROM list AS l ;
    $$
    
    DROP TRIGGER IF EXISTS after_insert ON thing_to_add ;
    CREATE TRIGGER after_insert 
      AFTER INSERT
      ON thing_to_add 
      FOR EACH ROW
      EXECUTE PROCEDURE after_insert_thing_to_add ();