Search code examples
postgresql-15

INSERT into table ON CONFLICT increase other column value -> column reference is ambigous? SOLVED


I have a large sourceTable (300 Million rows) with several columns:

id sequence orientation gc_content – etc

most sequences appear only once in the table, but others might be in there much more often

I need to know how often each sequence appears in the sourceTable, so I created a table targetTable with 3 columns:

id sequence frequency

The column sequence has been set to be unique.

I filled this targetTable with all distinct sequence values and set the frequency to 0 by default in each.

Now I want to select all the sequences from the sourceTable and "insert" them into the targetTable (which can't happen) and ON CONFLICT increase the frequency for that sequence by 1 (not sure that is the most intelligent way of going about it).

The code works fine for the first part

INSERT INTO targetTable (sequence, frequency)
(SELECT sequence, 1
FROM sourceTable)
ON CONFLICT (sequence)
DO NOTHING

but when I try

INSERT INTO targetTable (sequence, frequency)
(SELECT sequence, 1
FROM sourceTable)
ON CONFLICT (sequence)
--DO NOTHING
DO UPDATE SET
frequency = targetTable.frequency + 1
WHERE
sequence = targetTable.sequence -- = sourceTable.sequence
;

I get the message that sequence is ambiguous.

To make sequence not ambiguous I tried

WHERE
targetTable.sequence = sourceTable.sequence

and

WHERE
sourceTable.sequence = targetTable.sequence 

but then I get "ERROR: missing FROM-clause entry for table "sourceTable" "

What am I missing here?

Thanks for any advice - I would have liked to figure that one out on my own as it seems like a common problem that should be easy to research on the internet, but I banged my head against that since yesterday morning and am running out of time …


Solution

  • The virtual table associated with failed INSERT rows is named EXCLUDED, so the SQL would be as follows, except that each row can only be affected once:

    INSERT INTO targetTable (sequence, frequency)
    SELECT sequence, 1
      FROM sourceTable
    ON CONFLICT (sequence)
      DO UPDATE SET frequency = targetTable.frequency + 1
      WHERE EXCLUDED.sequence = targetTable.sequence;
    

    This is documented in PostgreSQL 15 Documention - INSERT SQL Command. Note, the SELECT clause didn't need to be enclosed in parentheses. Avoid using sequence as a column identifier since it is a SQL key word.

    A proper approach to get the counts would be to use the COUNT aggregate function grouped by sequence:

    INSERT INTO targetTable(sequence, frequency)
    SELECT sequence, COUNT(*) AS frequency
      FROM sourceTable
      GROUP BY sequence;