I'm trying to write an Ecto query where I am querying for many records and for each record wish to preload only the most recent record from a related table; however, things aren't quite working out as I'd like. Say, for example, I have a data model with many users
where each has many comments
- the result I'm trying to achieve is a list of all users where each has only their most recent published comment preloaded.
The closest I've been able to muster is
# assume that my User model has a `has_many(:comments)` relationship defined
defp preload_most_recent_comment(_user_ids) do
Comment
|> where([c], c.status == :published)
|> last(:inserted_at)
|> Repo.all()
end
def get_users_with_most_recent_comment() do
User
|> preload([comments: ^(&preload_most_recent_comment/1)])
|> Repo.all()
end
however, the problem here is that it doesn't preload each user's most recent published comment - it only preloads the single most recent published comment and puts it on the associated user... so the results look something like
[
%User{
id: 1,
comments: [%{id: 1, status: "published", inserted_at: ~U[2022-01-01 00:00:00]}]
},
%User{
id: 2,
comments: []
}
]
even though user 2
has a published comment. If I remove the last
statement, then the preload works essentially the way I'd like it to, albeit that it returns multiple comments for each user, rather than just the most recent.
What is the appropriate way to adjust this query so that each preload is not only filtered by something like status
, but also limited to only the most recent record?
Ok, a lot of sleuthing and trial & error led me to a solution using a windowing function... here's how it all went down in the end. EDIT: the query below actually wasn't quite correct - it would actually return multiple comments per user - leaving it for posterity and comparison, but see the revised version below that for the "real" working version...
defp most_recent_comment() do
Comment
|> select([c], %{c | inserted_at: over(last_value(c.inserted_at), :user)})
|> windows([c], [user: [partition_by: c.user_id, order_by: [desc: c.inserted_at]]])
end
defp preload_published_comments(_) do
Comment
|> where([c], c.status == :published)
|> join(:inner, [c], rc in subquery(most_recent_comment()), on: c.id == rc.id)
|> Repo.all()
end
def get_users_with_most_recent_published_comment() do
User
|> preload([comments: ^(&preload_published_comments/1)])
|> Repo.all()
end
Ok, here's the "correct" version, making use of row_number()
and moving the where
clause to the subquery -
defp most_recent_published_comment() do
Comment
|> where([c], c.status == :published)
|> select([c], %{id: c.id, row_number: over(row_number(), :user_id_partition)})
|> windows([_], [user_id_partition: [partition_by: :user_id, order_by: [desc: :inserted_at]]])
end
defp preload_published_comments(_) do
Comment
|> join(:inner, [c], rc in subquery(most_recent_published_comment()),
on: c.id == rc.id and rc.row_number == 1
)
|> Repo.all()
end
def get_users_with_most_recent_published_comment() do
User
|> preload([comments: ^(&preload_published_comments/1)])
|> Repo.all()
end
There is documentation around this at https://hexdocs.pm/ecto/Ecto.Query.html#preload/3-preload-queries, but I still had to muck about for a while. Hope this is helpful!
Other useful windowing documentation from Ecto - https://hexdocs.pm/ecto/Ecto.Query.WindowAPI.html