Search code examples
postgresqljointransactional

Conditional insert of new records into transaction table


I have a standing table of historical transactions and I would like to insert records from a new table where the status of the new record is different from the old at the time that the new record was "valid" (based on a valid_from column in both tables).

For example, take the standing/transactional table (I'll call it old):

CREATE TABLE old (id serial, key text, valid_from date, status text);
INSERT INTO old (key, valid_from, status)
VALUES
('A', '2014-01-01', 'x'),
('A', '2014-02-01', 'y'),
('A', '2014-03-01', 'z'),
('B', '2014-01-10', 'x'),
('B', '2014-02-15', 'y');

SELECT * FROM old;

 id | key | valid_from | status
----+-----+------------+--------
  1 | A   | 2014-01-01 | x
  2 | A   | 2014-02-01 | y
  3 | A   | 2014-03-01 | z
  4 | B   | 2014-01-10 | x
  5 | B   | 2014-02-15 | y

And the new table of updates:

CREATE TABLE new (id serial, key text, valid_from date, status text);
INSERT INTO new (key, valid_from, status)
VALUES
('A', '2014-01-15', 'x'),
('A', '2014-02-15', 'x'),
('A', '2014-03-15', 'z'),
('B', '2014-02-15', 'y');

SELECT * FROM new;

 id | key | valid_from | status
----+-----+------------+--------
  1 | A   | 2014-01-15 | x
  2 | A   | 2014-02-15 | x
  3 | A   | 2014-03-15 | z
  4 | B   | 2014-02-15 | y

Ideally, I would like to insert only row 2 of the new table because it has a different status than record with the same key in the old table with the most recent valid_from date prior to the new record. Specifically:

  • id = 1 in the new table should not be inserted. The maximum valid_from date in the old table for key A ('2014-01-01') prior to the new valid_from date ('2014-01-15'); i.e., they both have a status of x.
  • id = 2 in the new table should be inserted because its status (x) is different than the status in the old table (y) where the valid_from date in the old table was equal to the maximum valid_from for that key that is prior to the valid_from in the new table.
  • id = 3 should not be inserted because it shares the same status (x) as the record in the old table for the most recent valid_from
  • id = 4 should not be inserted because, again, it's status (y) is identical to the corresponding record in the old table.

After the insert, the old table should look like:

SELECT * FROM old ORDER BY key, valid_from;

 id | key | valid_from | status
----+-----+------------+--------
  1 | A   | 2014-01-01 | x
  2 | A   | 2014-02-01 | y
  6 | A   | 2014-02-15 | x
  3 | A   | 2014-03-01 | z
  4 | B   | 2014-01-10 | x
  5 | B   | 2014-02-15 | y

The goal: there should never be two successive records with the same key and status when sorted by ascending valid_from.

If the idea were to simply insert records with a key + status combination that did not exist in the old table, I would do something like this:

INSERT INTO old (key, valid_from, status)
SELECT new.key, new.valid_from, new.status
FROM new LEFT JOIN old USING(key, status)
WHERE old.key IS NULL;

However, this misses id 2 in the new table since id 1 in the old table shares the same status, even though there is a more recent record in the old table that does indeed differ.

Whatever I try, I can't seem to check the new record against only the most "recent" old record with a valid_from < the valid_from in the new record. I have a feeling I can do it with a combination of a window function and/or a DISTINCT ON to find the maximum valid_from, but I can't quite seem to put the two together. For instance, I've tried things like:

INSERT INTO old(key, valid_from, status)
SELECT n.key, n.valid_from, n.status
FROM new n LEFT JOIN (
    SELECT DISTINCT ON (old.key) old.*
    FROM new
        JOIN old
        ON old.key = new.key
        AND old.valid_from < new.valid_from
        ORDER BY old.key, old.valid_from DESC) o
ON
    n.key = o.key
    AND n.status = o.status
WHERE
    o.key IS NULL;

However, this winds up inserting the first record from the new table:

 id | key | valid_from | status
----+-----+------------+--------
  1 | A   | 2014-01-01 | x
  6 | A   | 2014-01-15 | x
  2 | A   | 2014-02-01 | y
  7 | A   | 2014-02-15 | x
  3 | A   | 2014-03-01 | z
  4 | B   | 2014-01-10 | x
  8 | B   | 2014-02-15 | y
  5 | B   | 2014-02-15 | y

I'm using PostgreSQL 9.3. Any pointers on an approach to this problem? Thanks in advance!


Solution

  • After some more toying, I think I have a solution (although probably not the best one). The logical sequence of steps would be:

    1. Join on the key, only where the new record's valid_from is greater than the old record's valid_from
    2. For each match, include a flag for whether the old record's valid_from is the most recent valid_from among all candidate matches (we'll call it is_most_recent). I did this with a window function (which can't be used in a WHERE clause).
    3. Ensure the new status does not equal the old status
    4. Insert these records

    Here's the best I've come up with so far:

    INSERT INTO old (key, valid_from, status)
    SELECT key, valid_from, status FROM (
        SELECT
            new.key, 
            new.valid_from, 
            new.status,
            -- Does the new record have a different statusf from the old?
            old.status != new.status AS is_different_status, 
            -- Are we comparing the most "recent" record in the old table?
            old.valid_from = MAX(old.valid_from) OVER (PARTITION BY old.key, new.id)
                AS is_most_recent
        FROM new JOIN old
            ON new.key = old.key
            AND new.valid_from >= old.valid_from) AS to_insert
    WHERE is_different_status AND is_most_recent;
    

    I'd love to know if there's a better solution out there.