Search code examples
sqlpostgresqltriggerssql-updateplpgsql

UPDATE ... FROM ... inside trigger function not working as expected?


I am writing a PostgreSQL trigger function that will allow me to do an ETL transformation on a data. I have a table with fields (id, price, EFF_FROM, EFF_TO), and if I only insert (id, price, EFF_FROM), EFF_TO needs to be calculated automatically as the next EFF_FROM of this id (-1 day) if it exists, and "5999-12-31" if it does not.

To update the table, I am using UPDATE ... FROM inside of a trigger function, but it seems to update all EFF_TO values in the table with the last given EFF_FROM value.

The trigger function that I use looks like this:

CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
BEGIN
    UPDATE prices_schema.prices
    SET EFF_TO = subquery.next_eff_from
    FROM (
        SELECT COALESCE(
            LEAD(EFF_FROM, 1)
            OVER (
                ORDER BY EFF_FROM ASC
            ),
            TO_DATE('6000-00-00', 'YYYY-MM-DD')
        ) - 1 AS next_eff_from FROM prices_schema.prices
    ) AS subquery;
    
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE TRIGGER after_insert_prices
AFTER INSERT ON prices_schema.prices
FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();

Load some test data:

INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
    VALUES(1, 100, '2017-01-12');
    
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
    VALUES(1, 150, '2017-02-09');
    
INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
    VALUES(1, 125, '2017-01-27');

Execute:

SELECT * FROM prices_schema.prices
    ORDER BY EFF_FROM ASC;

Result:

id price EFF_FROM EFF_TO
1 100 2017-01-12 2017-01-26
1 125 2017-01-27 2017-01-26
1 150 2017-02-09 2017-01-26

EFF_TO obviously doesn't look as expected to. What amazes me more is that calling this query:

SELECT eff_from, COALESCE(
            LEAD(EFF_FROM, 1)
            OVER (
                ORDER BY EFF_FROM ASC
            ),
            TO_DATE('6000-00-00', 'YYYY-MM-DD')
        ) - 1 AS next_eff_from FROM prices_schema.prices;

Gives me the correct result:

EFF_FROM next_eff_from
2017-01-12 2017-01-26
2017-01-27 2017-02-08
2017-02-09 5999-12-31

I bet this has something to do with how triggers & trigger functions work, but I wasn't able to find the right thing. I would like to be pointed out in the right direction.


Solution

  • Turns out, I was only 2 lines away from one:

    CREATE OR REPLACE FUNCTION prices_schema.prices_etl() RETURNS TRIGGER AS $$
    BEGIN
        UPDATE prices_schema.prices p0
        SET eff_to = subquery.next_eff_from
        FROM (
            SELECT id, price, COALESCE(
                LEAD(eff_from, 1)
                OVER (
                    PARTITION BY id
                    ORDER BY eff_from ASC
                ),
                TO_DATE('6000-00-00', 'YYYY-MM-DD')
            ) - 1 AS next_eff_from FROM prices_schema.prices
        ) AS subquery
        WHERE subquery.id = p0.id AND subquery.price = p0.price;
        
        RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
    
    CREATE OR REPLACE TRIGGER after_insert_prices
    AFTER INSERT ON prices_schema.prices
    FOR EACH ROW EXECUTE PROCEDURE prices_schema.prices_etl();
    

    This now works as intended, even if multiple different id values are inserted. Execute:

    INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
        VALUES(1, 100, '2017-01-12');
        
    INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
        VALUES(1, 150, '2017-02-09');
        
    INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
        VALUES(1, 125, '2017-01-27');
        
    INSERT INTO prices_schema.prices(id, Price, EFF_FROM)
        VALUES (2, 900, '2016-05-05');
        
    SELECT * FROM prices_schema.prices;
    

    Result:

    id price eff_from eff_to
    1 100 2017-01-12 2017-01-26
    1 150 2017-02-09 5999-12-31
    1 125 2017-01-27 2017-02-08
    2 900 2016-05-05 5999-12-31

    Yes, I know it probably is still heavily unefficient, but it works. It seems to work for bulk inserts, too, but it wasn't my primary focus anyway since I'm inserting data via psycopg2, parsing .json files.

    What's more important is that it correctly updates two rows, if two are affected by an insert. Ex.: when a new row with price = 125 is inserted, eff_from fields need to be updated both for this row and for row with price = 100, since we're inserting in between the time intervals.