Search code examples
sqlpostgresqlelixirphoenix-frameworkecto

Using LATERAL joins in Ecto v2.0


I'm trying to join the latest comment on a post record, like so:

comment = from c in Comment, order_by: [desc: c.inserted_at], limit: 1

post = Repo.all(
  from p in Post,
  where: p.id == 123,
  join: c in subquery(comment), on: c.post_id == p.id,
  select: [p.title, c.body],
  limit: 1
)

Which generates this SQL:

SELECT p0."title", 
       c1."body" 
FROM   "posts" AS p0 
       INNER JOIN (SELECT p0."id", 
                          p0."body", 
                          p0."inserted_at", 
                          p0."updated_at" 
                   FROM   "comments" AS p0 
                   ORDER  BY p0."inserted_at" DESC 
                   LIMIT  1) AS c1 
               ON c1."post_id" = p0."id" 
WHERE  ( p0."id" = 123 ) 
LIMIT  1 

It just returns nil. If I remove the on: c.post_id == p.id it'll return data, but obviously it'll return the lastest comment for all posts, not the post in question.

What am I doing wrong? A fix could be to use a LATERAL join subquery, but I can't figure out whether it's possible to pass the p reference into a subquery.

Thanks!


Solution

  • The issue was caused by the limit: 1 here:

    comment = from c in Comment, order_by: [desc: c.inserted_at], limit: 1
    

    Since the resulting query was SELECT * FROM "comments" AS p0 ORDER BY p0."inserted_at" DESC LIMIT 1, it was only returning the most recent comment on ANY post, not the post I was querying against.

    FYI the query was >150ms with ~200,000 comment rows, but that was brought down to ~12ms with a simple index:

    create index(:comments, ["(inserted_at::date) DESC"])
    

    It's worth noting that while this query works in returning the post in question and only the most recent comment, it'll actually return $number_of_comments rows if you remove the limit: 1. So say if you wanted to retrieve all 100 posts in your database with the most recent comment of each, and you had 200,000 comments in the database, this query would return 200,000 rows. Instead you should use a LATERAL join as discussed below.

    .

    Update

    Unfortunately ecto doesn't support LATERAL joins right now.

    An ecto fragment would work great here, however the join query wraps the fragment in additional parentheses (i.e. INNER JOIN (LATERAL (SELECT …))), which isn't valid SQL, so you'd have to use raw SQL for now:

    sql = """
    
      SELECT p."title", 
             c."body" 
      FROM   "posts" AS p 
             INNER JOIN LATERAL (SELECT c."id", 
                                        c."body", 
                                        c."inserted_at" 
                         FROM   "comments" AS c 
                         WHERE  ( c."post_id" = p."id" ) 
                         ORDER  BY c."inserted_at" DESC 
                         LIMIT  1) AS c 
                     ON true 
      WHERE  ( p."id" = 123 ) 
      LIMIT  1
    
    """
    
    res = Ecto.Adapters.SQL.query!(Repo, sql, [])
    

    This query returns in <1ms on the same database.

    Note this doesn't return your Ecto model struct, just the raw response from Postgrex.