Search code examples
postgresqlgreatest-n-per-groupdistinct-on

Selecting rows ordered by some column and distinct on another


Related to - PostgreSQL DISTINCT ON with different ORDER BY

I have table purchases (product_id, purchased_at, address_id)

Sample data:

| id | product_id |   purchased_at    | address_id |
| 1  |     2      | 20 Mar 2012 21:01 |     1      |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |
| 3  |     2      | 20 Mar 2012 21:39 |     2      |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |

The result I expect is the most recent purchased product (full row) for each address_id and that result must be sorted in descendant order by the purchased_at field:

| id | product_id |   purchased_at    | address_id |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |

Using query:

SELECT DISTINCT ON (address_id) purchases.address_id, purchases.*
FROM "purchases"
WHERE "purchases"."product_id" = 2
ORDER BY purchases.address_id ASC, purchases.purchased_at DESC

I'm getting:

| id | product_id |   purchased_at    | address_id |
| 2  |     2      | 20 Mar 2012 21:33 |     1      |
| 4  |     2      | 20 Mar 2012 21:48 |     2      |

So the rows is same, but order is wrong. Any way to fix it?


Solution

  • Quite a clear question :)

    SELECT t1.* FROM purchases t1
    LEFT JOIN purchases t2
    ON t1.address_id = t2.address_id AND t1.purchased_at < t2.purchased_at
    WHERE t2.purchased_at IS NULL
    ORDER BY t1.purchased_at DESC
    

    And most likely a faster approach:

    SELECT t1.* FROM purchases t1
    JOIN (
        SELECT address_id, max(purchased_at) max_purchased_at
        FROM purchases
        GROUP BY address_id
    ) t2
    ON t1.address_id = t2.address_id AND t1.purchased_at = t2.max_purchased_at
    ORDER BY t1.purchased_at DESC