Search code examples
sqlpostgresqlgreatest-n-per-group

Build a latest table from a history table using max()


I have two tables, a mls_history table, and a mls table. The mls_history table has every record uploaded to the system, and the mls table is a subset of mls_history, only containing the latest status date (stat_date).

For example:

MLS_HISTORY:

mlsnum | stat_date
-------------------
1      | 2013-01-04
2      | 2013-01-08
1      | 2013-04-09
1      |

MLS:

mlsnum | stat_date
-------------------
1      | 2013-04-09
2      | 2013-01-08

I want one insert/select query (if possible?) that grabs the max stat_date for each mlsnum, and stores it in the mls table.

There are a few examples already on stackoverflow, but almost all of them allow for multiple records if stat_dates are the same. I want this to only return 1 record, even if 2 meet the max criteria.

To further complicate things, there are no serial columns, so no unique id, so I can't just say "in case of a tie, pick the one with the max id".

Is this possible?


Solution

  • INSERT INTO mls                -- assuming it's empty
    SELECT DISTINCT ON (mlsnum) *
    FROM   mls_history
    ORDER  BY mlsnum, stat_date DESC;
    

    Select exactly 1 row per mlsnum with the latest stat_date.

    Details and explanation:
    Select first row in each GROUP BY group?