Search code examples
sqlpostgresqldateselectgreatest-n-per-group

SQL WHERE IN with limited and sorted results per IN id


I didn't quite know how to describe this, so please excuse the title...

I'm trying to write a function to retrieve the most recent message belonging to a conversation, when provided with an array of conversation IDs.

My "message" table looks as follows:

id | conversationId | body           | createdDate
-------------------------------------------------------
1  | 1              | Hello          | 2020-06-09 01:01
2  | 1              | How are you?   | 2020-06-09 01:02
3  | 2              | Hi             | 2020-06-09 01:02
4  | 1              | I'm good, you? | 2020-06-09 01:03
5  | 2              | Hey there!     | 2020-06-09 01:04

I got as far as to query:

SELECT * FROM "message" WHERE "conversationId" IN (1, 2) ORDER BY "createdDate" DESC

As expected this returns all of the messages, which match the conversation id's provided, sorted by most recent createdDate.

I'm a little confused at how I can LIMIT to only include the first result (most recent createdDate) for each conversationId.

The output I'm looking for would be:

id | conversationId | body           | createdDate
-------------------------------------------------------
4  | 1              | I'm good, you? | 2020-06-09 01:03
5  | 2              | Hey there!     | 2020-06-09 01:04

Solution

  • That's a typical greatest-n-per-group problem. In Postgres, a simple and efficient way to address this is distinct on:

    select distinct on (conversationId) m.*
    from messages m
    -- where conversationId in (...)    -- if needed
    order by conversationId, createdDate desc