Search code examples
sqlpostgresqllimitgreatest-n-per-groupwhere-in

Postgres limit number of rows for each in WHERE IN id from another table


I have a messaging app where I need to return all of the conversations that a user is part of and the messages associated with each one. I'd like to limit the number of messages per conversation.

Table structure is as follows:

Users

| id   | name | email    | created_at |
|------|------|----------|------------|
| 1    | Bob  | [email protected]  | timestamp  |
| 2    | Tom  | [email protected]  | timestamp  |
| 3    | Mary | [email protected]  | timestamp  |

Messages

| id   | sender_id | conversation_id  | message | created_at |
|------|-----------|------------------|---------|------------|
| 1    | 1         | 1                | text    | timestamp  |
| 2    | 2         | 2                | text    | timestamp  |
| 3    | 2         | 1                | text    | timestamp  |
| 4    | 3         | 3                | text    | timestamp  |

Conversations

| id | created_at |
|----|------------|
| 1  | timestamp  |
| 2  | timestamp  |
| 3  | timestamp  |

Conversations_Users

| id | user_id | conversation_id |
|----|---------|-----------------|
| 1  | 1       | 1               |
| 2  | 2       | 1               |
| 3  | 2       | 2               |
| 3  | 3       | 2               |
| 4  | 3       | 3               |
| 5  | 1       | 3               |

I want to load up all the conversations that user (id 1) is in (In the case of the example - conversation 1 and 3). For each conversation I need the messages associated with it, grouped by conversation_id, ordered by created_at ASC. My current query handles this:

SELECT
    *
FROM
    messages
WHERE
    conversation_id IN (
        SELECT
            conversation_id
        FROM
            conversations_users
        WHERE
            user_id = 1
    )
ORDER BY
    conversation_id, created_at ASC;

However, that would stick a lot of data into memory. Therefore, I'd like to limit the number of messages per conversation.

I've looked at rank() and ROW_NUMBER() but am unsure how to implement them/if they are what's needed.


Solution

  • You can indeed use row_number(). The following query will give you the last 10 messages per conversation of the given user:

    select *
    from (
        select 
            m.*, 
            row_number() over(
                partition by cu.user_id, m.conversation_id 
                order by m.created_at desc
            ) rn
        from messages m
        inner join conversations_users cu 
            on  cu.conversation_id  = m.conversation_id 
            and cu.user_id = 1
    ) t
    where rn <= 10
    order by conversation_id, created_at desc
    

    Notes:

    • I turned the subquery with in to a regular join, since I believe that it is a neater way to express your requirement

    • I added the id of the user to the partitioning clause; so, if you remove the where clause that filters on the user, you get the 10 last messages of the conversations of each and every user