Search code examples
sqlpostgresqlgreatest-n-per-group

How to query the original and updated description of an item


Let's say I have 2 tables (Postgres if it matters):

CREATE TABLE items 
(
    item_id bigint BIGSERIAL,
    description character varying(2000) NOT NULL,
    date TIMESTAMP default now(),
    -- (many more columns)
    CONSTRAINT "item_pk" PRIMARY KEY ("item_id")
);

CREATE TABLE item_updates 
(
    update_id bigint BIGSERIAL,
    item_id_fk bigint NOT NULL,
    date TIMESTAMP default now(),
    -- (more columns)
    description character varying(2000) NOT NULL,

    CONSTRAINT "item_updates_pk" 
        PRIMARY KEY ("update_id"),
    CONSTRAINT "item_updates_item_fk" 
        FOREIGN KEY (item_id_fk) REFERENCES items(item_id) NOT DEFERRABLE,
);

When I insert a new item in table items the description is set. When the description changes, an update is inserted in table item_updates that references the item_id and with that the description is changed and history preserved.

I know how to find the latest description for a given item_id in table item_updates, but where I stuck is the following; a query that gives me the latest description for an item.

For an item_id that has no updates in table item_updates, it should show the description and (many more columns) from table items.

For an item that has updated in table item_updates, it should show the latest description from table item_updates including (more columns) and the corresponding (many more columns) from table items.

The resulting query should contain all item_id entries (+ columns) and corresponding latest item_updates columns.

What I have now is this:

SELECT 
    COALESCE(i.description) AS desccription, *
FROM 
    items i 
LEFT JOIN 
    (SELECT 
         itm.item_id, MAX(upd.item_id) AS uid 
     FROM 
         items itm 
     LEFT JOIN 
         item_updates upd ON itm.item_id = upd.item_id_fk 
     GROUP BY 
         itm.item_id) AS sub ON sub.item_id = i.item_id;

The COALESCE is there because tried COALESCE(sub.description, i.description) but that was not allowed. The result of this query is incorrect however. The error says it can 'see' table sub in the first SELECT statement.

I see many questions and answers when all info is in the same table, but I could not find (or did not have the right search terms) this particular scenario. The answer that CoPilot gave me was also wrong.


Solution

  • Fixed Setup

    CREATE TABLE items (
      item_id     bigserial PRIMARY KEY
    , inserted_at timestamp NOT NULL DEFAULT now()
    , description varchar(2000) NOT NULL
    -- , many more columns
    );
    
    CREATE TABLE item_updates (
      update_id   bigserial PRIMARY KEY
    , item_id_fk  bigint NOT NULL
    , updated_at  timestamp NOT NULL DEFAULT now()
    , description varchar(2000) NOT NULL
    -- , more columns
    , CONSTRAINT item_updates_item_fk FOREIGN KEY (item_id_fk) REFERENCES items
    );
    

    Fixed a couple of syntax issues. And I don't use the name "date" for a timestamp column, that would be self-sabotage. Using inserted_at and updated_at instead. And more often than not, you really want timestamptz, not timestamp. See:

    Query

    All items:

    SELECT i.item_id, COALESCE(u.description, i.description) AS description
     --  , i.column1, u.column2  -- whatever
    FROM   items i 
    LEFT   JOIN (
       SELECT DISTINCT ON (u.item_id_fk)
              *  --  or whatever you want
       FROM   item_updates u
       ORDER  BY u.item_id_fk, u.update_at DESC
       ) u ON u.item_id_fk = i.item_id;
    

    Given item_id:

    SELECT i.item_id, COALESCE(u.description, i.description) AS description
     --  , i.column1, u.column2  -- whatever
    FROM   items i 
    LEFT   JOIN LATERAL (
       SELECT *  --  or whatever you want
       FROM   item_updates u
       WHERE  u.item_id_fk = i.item_id
       ORDER  BY  u.update_at DESC
       LIMIT  1
       ) u ON true
    WHERE  i.item_id = 123;  -- your input here
    

    fiddle

    Related:

    About DISTINCT ON:

    And how to optimize performance for certain data distributions:

    About LATERAL subqueries: