Search code examples
sqlpostgresqlsql-functiongenerated-columns

Recompute values in Postgres generated column after underlying function changed


Image a table with a generated column like this:

CREATE OR REPLACE FUNCTION extract_first_name(p_name text) RETURNS text
LANGUAGE SQL IMMUTABLE
AS $$
    SELECT split_part(p_name, ' ', 1);
$$;

CREATE TABLE customers (
    id serial,
    name text,
    first_name text GENERATED ALWAYS AS (extract_first_name(name)) STORED
);

Later someone finds that the extract_first_name function is too simplistic and needs to be changed. It's then updated, but the values in the first_name column stays the same. How do we in the simplest and most efficient way recompute all the values in the first_name column to use the latest version of the function w/o locking the table?


Solution

  • A generated column is 'computed' on 2 events. When the row is Inserted and when the underlying column is Updated. Thus you need to Update every row with something like

    Update customers 
       set name = name; 
    

    This will, as needed, update every row in the table. But as Postgres uses a MVCC model this will not block others from selecting during the operation.

    The main advantage of using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading. PostgreSQL maintains this guarantee even when providing the strictest level of transaction isolation through the use of an innovative Serializable Snapshot Isolation (SSI) level.

    However, you will need to lookout for deadlocks if you continue to allow updates during this process.