Search code examples
mysqlsqlpostgresqlgreatest-n-per-groupdistinct-on

Selecting the most recent, lowest price from multiple vendors for an inventory item


I’m fairly proficient at SQL, however this question has had myself stumped for quite a while now. In the most basic sense, there are simply two tables:

Items
+----+--------+
| id | title  |
+----+--------+
|  1 | socks  |
|  2 | banana |
|  3 | watch  |
|  4 | box    |
|  5 | shoe   |
+----+--------+

...and the prices table:

Prices
+---------+-----------+-------+------------+
| item_id | vendor_id | price | created_at |
+---------+-----------+-------+------------+
|       1 |         1 | 5.99  | Today      |
|       1 |         2 | 4.99  | Today      |
|       2 |         1 | 6.99  | Today      |
|       2 |         2 | 6.99  | Today      |
|       1 |         1 | 3.99  | Yesterday  |
|       1 |         1 | 4.99  | Yesterday  |
|       2 |         1 | 6.99  | Yesterday  |
|       2 |         2 | 6.99  | Yesterday  |
+---------+-----------+-------+------------+

(Please note: created_at is actually a timestamp, the words “Today” and “Yesterday” were provided merely to quickly convey the concept).

My goal is to get a simple result back containing the inventory item associated with the most recent, lowest price, including the reference to the vendor_id who is providing said price.

However, I find the stumbling block appears to be the sheer number of requirements for the statement (or statements) to handle:

  • Each item has multiple vendors, so we need to determine which price between all the vendors for each item is the lowest
  • New prices for the items get appended regularly, therefore we only want to consider the most recent price for each item for each vendor
  • We want to roll all that up into a single result, one item per row which includes the item, price and vendor

It seems simple, but I’ve found this problem to be uncanningly difficult.

As a note, I’m using Postgres, so all the fanciness it provides is available for use (ie: window functions).


Solution

  • Much simpler with DISTINCT ON in Postgres:

    Current price per item for each vendor

    SELECT DISTINCT ON (p.item_id, p.vendor_id)
           i.title, p.price, p.vendor_id
    FROM   prices p
    JOIN   items  i ON i.id = p.item_id
    ORDER  BY p.item_id, p.vendor_id, p.created_at DESC;
    

    Optimal vendor for each item

    SELECT DISTINCT ON (item_id) 
           i.title, p.price, p.vendor_id -- add more columns as you need
    FROM (
       SELECT DISTINCT ON (item_id, vendor_id)
              item_id, price, vendor_id -- add more columns as you need
       FROM   prices p
       ORDER  BY item_id, vendor_id, created_at DESC
       ) p
    JOIN   items i ON i.id = p.item_id
    ORDER  BY item_id, price;
    

    ->SQLfiddle demo

    Detailed explanation:
    Select first row in each GROUP BY group?