Search code examples
sqlpostgresqlgreatest-n-per-group

How to join a table with a row in another table having the highest id?


I have two Postgres tables: 'user' with unique rows, and 'address' where a single user can have multiple addresses. I want to join specific 'user' row with one row from 'address' where the address.id is the highest max(id).

I wrote the following query which works OK but I suspect that when there are many addresses for one user, this query will produce a really large intermediate aggregate before it returns the the final results. Is there a better way to write this query?

select "user".id, 
       "user".username,     
       "user".trader_id, 
       address.id,                                             
       address.street, 
       address.trader_id 
    from "user", address 
  where "user".id = 6 
      and  address.trader_id = "user".trader_id                         
 order by address.id desc
 limit 1;

Solution

  • Your query is pretty good already for getting a single user. Definitely much faster than running a window function over the whole address table in a subquery.
    Your query, only slightly simplified with a USING clause:

    SELECT trader_id, u.id AS user_id, u.username, a.id AS adr_id, a.street
    FROM   "user"  u
    JOIN   address a USING (trader_id)
    WHERE  u.id = 6
    ORDER  BY a.id DESC
    LIMIT  1;
    

    A multicolumn index on address(trader_id, id) will give you optimal performance. Plus, obviously, an index on "user"(id). (No "intermediate aggregate" like you apprehended with LIMIT 1!) See:

    Alternatively, the same technique in a LATERAL subquery. Works for retrieving one or more users:

    SELECT u.trader_id, u.id AS user_id, u.username, a.*
    FROM   "user"  u
    LEFT   JOIN LATERAL (
       SELECT a.id AS adr_id, a.street
       FROM   address a
       WHERE  a.trader_id = u.trader_id
       ORDER  BY a.id DESC
       LIMIT  1
       ) a ON true
    WHERE  u.id = 6;  -- or for more than just the one
    

    About LATERAL subqueries:

    Also using LEFT JOIN to preserve users without any address.

    If you are going to use a window function, use row_number() rather than rank(). Do it in a LATERAL subquery while only retrieving a single (or few) user(s), to only involve relevant rows. And, unless you run Postgres 16 or later, add the frame clause ROWS UNBOUNDED PRECEDING for performance:

    SELECT u.trader_id, u.id AS user_id, u.username, a.*
    FROM   "user"  u
    LEFT   JOIN LATERAL (
       SELECT id AS adr_id, street
            , row_number() OVER (ORDER BY id DESC ROWS UNBOUNDED PRECEDING) AS rn
       FROM   address a
       WHERE  a.trader_id = u.trader_id
       ) a ON a.rn = 1
    WHERE  u.id = 6;  -- or for more than one user
    

    Why ROWS UNBOUNDED PRECEDING? See:

    Or use DISTINCT ON:

    SELECT DISTINCT ON (traider_id)
           trader_id, u.id AS user_id, u.username, a.id AS adr_id, a.street
    FROM   "user"  u
    JOIN   address a USING (trader_id)  -- or LEFT JOIN?
    WHERE  u.id = 6
    ORDER  BY trader_id, a.id DESC;
    

    See:

    __

    Aside: Rather don't use reserved words like "user" as identifier.